Feedback-Based Tree Search for Reinforcement Learning

Abstract

Inspired by recent successes of Monte-Carlo tree search (MCTS) in a number ofartificial intelligence (AI) application domains, we propose a model-basedreinforcement learning (RL) technique that iteratively applies MCTS on batchesof small, finite-horizon versions of the original infinite-horizon Markovdecision process. The terminal condition of the finite-horizon problems, or theleaf-node evaluator of the decision tree generated by MCTS, is specified usinga combination of an estimated value function and an estimated policy function.The recommendations generated by the MCTS procedure are then provided asfeedback in order to refine, through classification and regression, theleaf-node evaluator for the next iteration. We provide the first samplecomplexity bounds for a tree search-based RL algorithm. In addition, we showthat a deep neural network implementation of the technique can create acompetitive AI agent for the popular multi-player online battle arena (MOBA)game King of Glory.

Quick Read (beta)

loading the full paper ...