Active Reinforcement Learning with Monte-Carlo Tree Search

  • 2018-03-13 16:35:25
  • Sebastian Schulze, Owain Evans
  • 1

Abstract

Active Reinforcement Learning (ARL) is a twist on RL where the agent observesreward information only if it pays a cost. This subtle change makes explorationsubstantially more challenging. Powerful principles in RL like optimism,Thompson sampling, and random exploration do not help with ARL. We relate ARLin tabular environments to Bayes-Adaptive MDPs. We provide an ARL algorithmusing Monte-Carlo Tree Search that is asymptotically Bayes optimal.Experimentally, this algorithm is near-optimal on small Bandit problems andMDPs. On larger MDPs it outperforms a Q-learner augmented with specialisedheuristics for ARL. By analysing exploration behaviour in detail, we uncoverobstacles to scaling up simulation-based algorithms for ARL.

 

Quick Read (beta)

loading the full paper ...