Human-Level Reinforcement Learning through Theory-Based Modeling, Exploration, and Planning

Abstract

Reinforcement learning (RL) studies how an agent comes to achieve reward inan environment through interactions over time. Recent advances in machine RLhave surpassed human expertise at the world's oldest board games and manyclassic video games, but they require vast quantities of experience to learnsuccessfully -- none of today's algorithms account for the human ability tolearn so many different tasks, so quickly. Here we propose a new approach tothis challenge based on a particularly strong form of model-based RL which wecall Theory-Based Reinforcement Learning, because it uses human-like intuitivetheories -- rich, abstract, causal models of physical objects, intentionalagents, and their interactions -- to explore and model an environment, and planeffectively to achieve task goals. We instantiate the approach in a video gameplaying agent called EMPA (the Exploring, Modeling, and Planning Agent), whichperforms Bayesian inference to learn probabilistic generative models expressedas programs for a game-engine simulator, and runs internal simulations overthese models to support efficient object-based, relational exploration andheuristic planning. EMPA closely matches human learning efficiency on a suiteof 90 challenging Atari-style video games, learning new games in just minutesof game play and generalizing robustly to new game situations and new levels.The model also captures fine-grained structure in people's explorationtrajectories and learning dynamics. Its design and behavior suggest a wayforward for building more general human-like AI systems.

Quick Read (beta)

loading the full paper ...