Adversarial Bandits with Knapsacks

Abstract

We consider Bandits with Knapsacks (henceforth, BwK), a general model formulti-armed bandits under supply/budget constraints. In particular, a banditalgorithm needs to solve a well-known knapsack problem: find an optimal packingof items into a limited-size knapsack. The BwK problem is a commongeneralization of numerous motivating examples, which range from dynamicpricing to repeated auctions to dynamic ad allocation to network routing andscheduling. While the prior work on BwK focused on the stochastic version, wepioneer the other extreme in which the outcomes can be chosen adversarially.This is a considerably harder problem, compared to both the stochastic versionand the "classic" adversarial bandits, in that regret minimization is no longerfeasible. Instead, the objective is to minimize the competitive ratio: theratio of the benchmark reward to the algorithm's reward. We design an algorithm with competitive ratio O(log T) relative to the bestfixed distribution over actions, where T is the time horizon; we also prove amatching lower bound. The key conceptual contribution is a new perspective onthe stochastic version of the problem. We suggest a new algorithm for thestochastic version, which builds on the framework of regret minimization inrepeated games and admits a substantially simpler analysis compared to priorwork. We then analyze this algorithm for the adversarial version and use it asa subroutine to solve the latter.

Quick Read (beta)

loading the full paper ...