Active Reinforcement Learning: Observing Rewards at a Cost

Abstract

Active reinforcement learning (ARL) is a variant on reinforcement learningwhere the agent does not observe the reward unless it chooses to pay a querycost c > 0. The central question of ARL is how to quantify the long-term valueof reward information. Even in multi-armed bandits, computing the value of thisinformation is intractable and we have to rely on heuristics. We propose andevaluate several heuristic approaches for ARL in multi-armed bandits and(tabular) Markov decision processes, and discuss and illustrate somechallenging aspects of the ARL problem.

Quick Read (beta)

loading the full paper ...