Abstract
We consider the combinatorial volatile Gaussian process (GP) semi-banditproblem. Each round, an agent is provided a set of available base arms and mustselect a subset of them to maximize the long-term cumulative reward. We studythe Bayesian setting and provide novel Bayesian cumulative regret bounds forthree GP-based algorithms: GP-UCB, GP-BayesUCB and GP-TS. Our bounds extendprevious results for GP-UCB and GP-TS to the infinite, volatile andcombinatorial setting, and to the best of our knowledge, we provide the firstregret bound for GP-BayesUCB. Volatile arms encompass other widely consideredbandit problems such as contextual bandits. Furthermore, we employ ourframework to address the challenging real-world problem of onlineenergy-efficient navigation, where we demonstrate its effectiveness compared tothe alternatives.