Exploiting Correlation in Finite-Armed Structured Bandits

  • 2018-10-18 17:01:00
  • Samarth Gupta, Gauri Joshi, Osman Yağan
  • 3

Abstract

We consider a correlated multi-armed bandit problem in which rewards of armsare correlated through a hidden parameter. Our approach exploits thecorrelation among arms to identify some arms as sub-optimal and pulls them only$\mathcal{O}(1)$ times. This results in significant reduction in cumulativeregret, and in fact our algorithm achieves bounded (i.e., $\mathcal{O}(1)$)regret whenever possible; explicit conditions needed for bounded regret to bepossible are also provided by analyzing regret lower bounds. We propose severalvariants of our approach that generalize classical bandit algorithms such asUCB, Thompson sampling, KL-UCB to the structured bandit setting, andempirically demonstrate their superiority via simulations.

 

Quick Read (beta)

loading the full paper ...