Abstract
Model-based planning is often thought to be necessary for deep, carefulreasoning and generalization in artificial agents. While recent successes ofmodel-based reinforcement learning (MBRL) with deep function approximation havestrengthened this hypothesis, the resulting diversity of model-based methodshas also made it difficult to track which components drive success and why. Inthis paper, we seek to disentangle the contributions of recent methods byfocusing on three questions: (1) How does planning benefit MBRL agents? (2)Within planning, what choices drive performance? (3) To what extent doesplanning improve generalization? To answer these questions, we study theperformance of MuZero (Schrittwieser et al., 2019), a state-of-the-art MBRLalgorithm, under a number of interventions and ablations and across a widerange of environments including control tasks, Atari, and 9x9 Go. Our resultssuggest the following: (1) The primary benefit of planning is in driving policylearning. (2) Using shallow trees with simple Monte-Carlo rollouts is asperformant as more complex methods, except in the most difficult reasoningtasks. (3) Planning alone is insufficient to drive strong generalization. Theseresults indicate where and how to utilize planning in reinforcement learningsettings, and highlight a number of open questions for future MBRL research.