On Bits and Bandits: Quantifying the Regret-Information Trade-off

Abstract

In interactive decision-making tasks, information can be acquired by directinteractions, through receiving indirect feedback, and from externalknowledgeable sources. We examine the trade-off between the information anagent accumulates and the regret it suffers. We show that information fromexternal sources, measured in bits, can be traded off for regret, measured inreward. We invoke information-theoretic methods for obtaining regret lowerbounds, that also allow us to easily re-derive several known lower bounds. Wethen generalize a variety of interactive decision-making tasks with externalinformation to a new setting. Using this setting, we introduce the firstBayesian regret lower bounds that depend on the information an agentaccumulates. These lower bounds also prove the near-optimality of Thompsonsampling for Bayesian problems. Finally, we demonstrate the utility of thesebounds in improving the performance of a question-answering task with largelanguage models, allowing us to obtain valuable insights.

Quick Read (beta)

loading the full paper ...