Model-Based Reinforcement Learning for Atari

Abstract

Model-free reinforcement learning (RL) can be used to learn effectivepolicies for complex tasks, such as Atari games, even from image observations.However, this typically requires very large amounts of interaction --substantially more, in fact, than a human would need to learn the same games.How can people learn so quickly? Part of the answer may be that people canlearn how the game works and predict which actions will lead to desirableoutcomes. In this paper, we explore how video prediction models can similarlyenable agents to solve Atari games with fewer interactions than model-freemethods. We describe Simulated Policy Learning (SimPLe), a complete model-baseddeep RL algorithm based on video prediction models and present a comparison ofseveral model architectures, including a novel architecture that yields thebest results in our setting. Our experiments evaluate SimPLe on a range ofAtari games in low data regime of 100k interactions between the agent and theenvironment, which corresponds to two hours of real-time play. In most gamesSimPLe outperforms state-of-the-art model-free algorithms, in some games byover an order of magnitude.

Quick Read (beta)

loading the full paper ...