Fast Exploration with Simplified Models and Approximately Optimistic Planning in Model Based Reinforcement Learning

Abstract

Humans learn to play video games significantly faster than thestate-of-the-art reinforcement learning (RL) algorithms. People seem to buildsimple models that are easy to learn to support planning and strategicexploration. Inspired by this, we investigate two issues in leveragingmodel-based RL for sample efficiency. First we investigate how to performstrategic exploration when exact planning is not feasible and empirically showthat optimistic Monte Carlo Tree Search outperforms posterior sampling methods.Second we show how to learn simple deterministic models to support fastlearning using object representation. We illustrate the benefit of these ideasby introducing a novel algorithm, Strategic Object Oriented ReinforcementLearning (SOORL), that outperforms state-of-the-art algorithms in the game ofPitfall! in less than 50 episodes.

Quick Read (beta)

loading the full paper ...