Abstract
Active reinforcement learning (ARL) is a variant on reinforcement learningwhere the agent does not observe the reward unless it chooses to pay a querycost c > 0. The central question of ARL is how to quantify the long-term valueof reward information. Even in multi-armed bandits, computing the value of thisinformation is intractable and we have to rely on heuristics. We propose andevaluate several heuristic approaches for ARL in multi-armed bandits and(tabular) Markov decision processes, and discuss and illustrate somechallenging aspects of the ARL problem.
Quick Read (beta)
loading the full paper ...