Self-Supervised Discovering of Causal Features: Towards Interpretable Reinforcement Learning

Abstract

Deep reinforcement learning (RL) has recently led to many breakthroughs on arange of complex control tasks. However, the agent's decision-making process isgenerally not transparent. The lack of interpretability hinders theapplicability of RL in safety-critical scenarios. While several methods haveattempted to interpret vision-based RL, most come without detailed explanationfor the agent's behaviour. In this paper, we propose a self-supervisedinterpretable framework, which can discover causal features to enable easyinterpretation of RL agents even for non-experts. Specifically, aself-supervised interpretable network (SSINet) is employed to producefine-grained attention masks for highlighting task-relevant information, whichconstitutes most evidence for the agent's decisions. We verify and evaluate ourmethod on several Atari 2600 games as well as Duckietown, which is achallenging self-driving car simulator environment. The results show that ourmethod renders causal explanations and empirical evidences about how the agentmakes decisions and why the agent performs well or badly, especially whentransferred to novel scenes. Overall, our method provides valuable insight intothe internal decision-making process of vision-based RL. In addition, ourmethod does not use any external labelled data, and thus demonstrates thepossibility to learn high-quality mask through a self-supervised manner, whichmay shed light on new paradigms for label-free vision learning such asself-supervised segmentation and detection.

Quick Read (beta)

loading the full paper ...