Thinking Fast and Slow with Deep Learning and Tree Search

Abstract

Sequential decision making problems, such as structured prediction, roboticcontrol, and game playing, require a combination of planning policies andgeneralisation of those plans. In this paper, we present Expert Iteration(ExIt), a novel reinforcement learning algorithm which decomposes the probleminto separate planning and generalisation tasks. Planning new policies isperformed by tree search, while a deep neural network generalises those plans.Subsequently, tree search is improved by using the neural network policy toguide search, increasing the strength of new plans. In contrast, standard deepReinforcement Learning algorithms rely on a neural network not only togeneralise plans, but to discover them too. We show that ExIt outperformsREINFORCE for training a neural network to play the board game Hex, and ourfinal tree search agent, trained tabula rasa, defeats MoHex 1.0, the mostrecent Olympiad Champion player to be publicly released.

Quick Read (beta)

loading the full paper ...