Learning Two-Step Hybrid Policy for Graph-Based Interpretable Reinforcement Learning

Abstract

We present a two-step hybrid reinforcement learning (RL) policy that isdesigned to generate interpretable and robust hierarchical policies on the RLproblem with graph-based input. Unlike prior deep reinforcement learningpolicies parameterized by an end-to-end black-box graph neural network, ourapproach disentangles the decision-making process into two steps. The firststep is a simplified classification problem that maps the graph input to anaction group where all actions share a similar semantic meaning. The secondstep implements a sophisticated rule-miner that conducts explicit one-hopreasoning over the graph and identifies decisive edges in the graph inputwithout the necessity of heavy domain knowledge. This two-step hybrid policypresents human-friendly interpretations and achieves better performance interms of generalization and robustness. Extensive experimental studies on fourlevels of complex text-based games have demonstrated the superiority of theproposed method compared to the state-of-the-art.

Quick Read (beta)

loading the full paper ...