Abstract
We propose Generative Adversarial Tree Search (GATS), a sample-efficient DeepReinforcement Learning (DRL) algorithm. While Monte Carlo Tree Search (MCTS) isknown to be effective for search and planning in RL, it is oftensample-inefficient and therefore expensive to apply in practice. In this work,we develop a Generative Adversarial Network (GAN) architecture to model anenvironment's dynamics and a predictor model for the reward function. Weexploit collected data from interaction with the environment to learn thesemodels, which we then use for model-based planning. During planning, we deploya finite depth MCTS, using the learned model for tree search and a learnedQ-value for the leaves, to find the best action. We theoretically show thatGATS improves the bias-variance trade-off in value-based DRL. Moreover, we showthat the generative model learns the model dynamics using orders of magnitudefewer samples than the Q-learner. In non-stationary settings where theenvironment model changes, we find the generative model adapts significantlyfaster than the Q-learner to the new environment.