Abstract
Model-free reinforcement learning (RL) can be used to learn effectivepolicies for complex tasks, such as Atari games, even from image observations.However, this typically requires very large amounts of interaction --substantially more, in fact, than a human would need to learn the same games.How can people learn so quickly? Part of the answer may be that people canlearn how the game works and predict which actions will lead to desirableoutcomes. In this paper, we explore how video prediction models can similarlyenable agents to solve Atari games with orders of magnitude fewer interactionsthan model-free methods. We describe Simulated Policy Learning (SimPLe), acomplete model-based deep RL algorithm based on video prediction models andpresent a comparison of several model architectures, including a novelarchitecture that yields the best results in our setting. Our experimentsevaluate SimPLe on a range of Atari games and achieve competitive results withonly 100K interactions between the agent and the environment (400K frames),which corresponds to about two hours of real-time play.