Active Reinforcement Learning: Observing Rewards at a Cost

  • 2020-11-13 01:01:13
  • David Krueger, Jan Leike, Owain Evans, John Salvatier
  • 6


Active reinforcement learning (ARL) is a variant on reinforcement learningwhere the agent does not observe the reward unless it chooses to pay a querycost c > 0. The central question of ARL is how to quantify the long-term valueof reward information. Even in multi-armed bandits, computing the value of thisinformation is intractable and we have to rely on heuristics. We propose andevaluate several heuristic approaches for ARL in multi-armed bandits and(tabular) Markov decision processes, and discuss and illustrate somechallenging aspects of the ARL problem.


Quick Read (beta)

loading the full paper ...