Online Learning with Abstention

Abstract

We present an extensive study of the key problem of online learning wherealgorithms are allowed to abstain from making predictions. In the adversarialsetting, we show how existing online algorithms and guarantees can be adaptedto this problem. In the stochastic setting, we first point out a bias problemthat limits the straightforward extension of algorithms such as UCB-N totime-varying feedback graphs, as needed in this context. Next, we give a newalgorithm, UCB-GT, that exploits historical data and is adapted to time-varyingfeedback graphs. We show that this algorithm benefits from more favorableregret guarantees than a possible, but limited, extension of UCB-N. We furtherreport the results of a series of experiments demonstrating that UCB-GT largelyoutperforms that extension of UCB-N, as well as more standard baselines.

Quick Read (beta)

loading the full paper ...