TreeQN and ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning

Abstract

Combining deep model-free reinforcement learning with on-line planning is apromising approach to building on the successes of deep RL. On-line planningwith look-ahead trees has proven successful in environments where transitionmodels are known a priori. However, in complex environments where transitionmodels need to be learned from data, the deficiencies of learned models havelimited their utility for planning. To address these challenges, we proposeTreeQN, a differentiable, recursive, tree-structured model that serves as adrop-in replacement for any value function network in deep RL with discreteactions. TreeQN dynamically constructs a tree by recursively applying atransition model in a learned abstract state space and then aggregatingpredicted rewards and state-values using a tree backup to estimate Q-values. Wealso propose ATreeC, an actor-critic variant that augments TreeQN with asoftmax layer to form a stochastic policy network. Both approaches are trainedend-to-end, such that the learned model is optimised for its actual use in thetree. We show that TreeQN and ATreeC outperform n-step DQN and A2C on abox-pushing task, as well as n-step DQN and value prediction networks (Oh etal. 2017) on multiple Atari games. Furthermore, we present ablation studiesthat demonstrate the effect of different auxiliary losses on learningtransition models.

Quick Read (beta)

loading the full paper ...