ReinforceWalk: Learning to Walk in Graph with Monte Carlo Tree Search

Abstract

Learning to walk over a graph towards a target node for a given input queryand a source node is an important problem in applications such as knowledgegraph reasoning. It can be formulated as a reinforcement learning (RL) problemthat has a known state transition model, but with partial observability andsparse reward. To overcome these challenges, we develop a graph walking agentcalled ReinforceWalk, which consists of a deep recurrent neural network (RNN)and a Monte Carlo Tree Search (MCTS). To address partial observability, the RNNencodes the history of observations and map it into the Q-value, the policy andthe state value. In order to effectively train the agent from sparse reward, wecombine MCTS with the RNN policy to generate trajectories with more positiverewards. From these trajectories, we update the network in an off-policy mannerusing Q-learning and improves the RNN policy. Our proposed RL algorithmrepeatedly applies this policy improvement step to learn the entire model. Attesting stage, the MCTS is also combined with the RNN to predict the targetnode with higher accuracy. Experiment results on several graph-walkingbenchmarks show that we are able to learn better policies from less number ofrollouts compared to other baseline methods, which are mainly based on policygradient method.

Quick Read (beta)

loading the full paper ...