No-Regret Learning Under Adversarial Resource Constraints: A Spending Plan Is All You Need!

Abstract

We study online decision making problems under resource constraints, whereboth reward and cost functions are drawn from distributions that may changeadversarially over time. We focus on two canonical settings: $(i)$ onlineresource allocation where rewards and costs are observed before actionselection, and $(ii)$ online learning with resource constraints where they areobserved after action selection, under full feedback or bandit feedback. It iswell known that achieving sublinear regret in these settings is impossible whenreward and cost distributions may change arbitrarily over time. To address thischallenge, we analyze a framework in which the learner is guided by a spendingplan--a sequence prescribing expected resource usage across rounds. We designgeneral (primal-)dual methods that achieve sublinear regret with respect tobaselines that follow the spending plan. Crucially, the performance of ouralgorithms improves when the spending plan ensures a well-balanced distributionof the budget across rounds. We additionally provide a robust variant of ourmethods to handle worst-case scenarios where the spending plan is highlyimbalanced. To conclude, we study the regret of our algorithms when competingagainst benchmarks that deviate from the prescribed spending plan.

Quick Read (beta)

loading the full paper ...