Selective Reviews of Bandit Problems in AI via a Statistical View

Abstract

Reinforcement Learning (RL) is a widely researched area in artificialintelligence that focuses on teaching agents decision-making throughinteractions with their environment. A key subset includes stochasticmulti-armed bandit (MAB) and continuum-armed bandit (SCAB) problems, whichmodel sequential decision-making under uncertainty. This review outlines thefoundational models and assumptions of bandit problems, explores non-asymptotictheoretical tools like concentration inequalities and minimax regret bounds,and compares frequentist and Bayesian algorithms for managingexploration-exploitation trade-offs. Additionally, we explore K-armedcontextual bandits and SCAB, focusing on their methodologies and regretanalyses. We also examine the connections between SCAB problems and functionaldata analysis. Finally, we highlight recent advances and ongoing challenges inthe field.

Quick Read (beta)

loading the full paper ...