Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

  • 2019-11-19 13:58:52
  • Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy Lillicrap, David Silver
  • 363

Abstract

Constructing agents with planning capabilities has long been one of the mainchallenges in the pursuit of artificial intelligence. Tree-based planningmethods have enjoyed huge success in challenging domains, such as chess and Go,where a perfect simulator is available. However, in real-world problems thedynamics governing the environment are often complex and unknown. In this workwe present the MuZero algorithm which, by combining a tree-based search with alearned model, achieves superhuman performance in a range of challenging andvisually complex domains, without any knowledge of their underlying dynamics.MuZero learns a model that, when applied iteratively, predicts the quantitiesmost directly relevant to planning: the reward, the action-selection policy,and the value function. When evaluated on 57 different Atari games - thecanonical video game environment for testing AI techniques, in whichmodel-based planning approaches have historically struggled - our new algorithmachieved a new state of the art. When evaluated on Go, chess and shogi, withoutany knowledge of the game rules, MuZero matched the superhuman performance ofthe AlphaZero algorithm that was supplied with the game rules.

 

Quick Read (beta)

This feature is not avaialbe for this paper.