The Adaptivity Barrier in Batched Nonparametric Bandits: Sharp Characterization of the Price of Unknown Margin

Abstract

We study batched nonparametric contextual bandits under a margin conditionwhen the margin parameter $\alpha$ is unknown. To capture the statistical priceof this ignorance, we introduce the regret inflation criterion, defined as theratio between the regret of an adaptive algorithm and that of an oracle knowing$\alpha$. We show that the optimal regret inflation grows polynomial with thehorizon $T$, with exponent precisely given by the value of a convexoptimization problem involving the dimension, smoothness, and batch budget.Moreover, the minimizers of this optimization problem directly prescribe thebatch allocation and exploration strategy of a rate-optimal algorithm. Buildingon this principle, we develop RoBIN (RObust batched algorithm with adaptiveBINning), which achieves the optimal regret inflation up to logarithmicfactors. These results reveal a new adaptivity barrier: under batching,adaptation to an unknown margin parameter inevitably incurs a polynomialpenalty, sharply characterized by a variational problem. Remarkably, thisbarrier vanishes when the number of batches exceeds $\log \log T$; with only adoubly logarithmic number of updates, one can recover the oracle regret rate upto polylogarithmic factors.

Quick Read (beta)

loading the full paper ...