Combinatorial Sleeping Bandits with Fairness Constraints

Abstract

The multi-armed bandit (MAB) model has been widely adopted for studying manypractical optimization problems (network resource allocation, ad placement,crowdsourcing, etc.) with unknown parameters. The goal of the player here is tomaximize the cumulative reward in the face of uncertainty. However, the basicMAB model neglects several important factors of the system in many real-worldapplications, where multiple arms can be simultaneously played and an arm couldsometimes be "sleeping". Besides, ensuring fairness is also a key designconcern in practice. To that end, we propose a new Combinatorial Sleeping MABmodel with Fairness constraints, called CSMAB-F, aiming to address theaforementioned crucial modeling issues. The objective is now to maximize thereward while satisfying the fairness requirement of a minimum selectionfraction for each individual arm. To tackle this new problem, we extend anonline learning algorithm, UCB, to deal with a critical tradeoff betweenexploitation and exploration and employ the virtual queue technique to properlyhandle the fairness constraints. By carefully integrating these two techniques,we develop a new algorithm, called Learning with Fairness Guarantee (LFG), forthe CSMAB-F problem. Further, we rigorously prove that not only LFG isfeasibility-optimal, but it also has a time-average regret upper bounded by$\frac{N}{2\eta}+\frac{\beta_1\sqrt{mNT\log{T}}+\beta_2 N}{T}$, where N is thetotal number of arms, m is the maximum number of arms that can besimultaneously played, T is the time horizon, $\beta_1$ and $\beta_2$ areconstants, and $\eta$ is a design parameter that we can tune. Finally, weperform extensive simulations to corroborate the effectiveness of the proposedalgorithm. Interestingly, the simulation results reveal an important tradeoffbetween the regret and the speed of convergence to a point satisfying thefairness constraints.

Quick Read (beta)

loading the full paper ...