Self-Supervised Discovering of Causal Features: Towards Interpretable Reinforcement Learning

Abstract

Deep reinforcement learning (RL) has recently led to many breakthroughs on arange of complex control tasks. However, the agent's decision-making process isgenerally not transparent. The lack of interpretability hinders theapplicability of RL in safety-critical scenarios. In this paper, we propose aself-supervised interpretable framework, which employs a self-supervisedinterpretable network (SSINet) to discover and locate fine-grained causalfeatures that constitute most evidence for the agent's decisions. We verify andevaluate our method on several Atari 2600 games as well as Duckietown. Theresults show that our method renders causal explanations and empiricalevidences about how the agent makes decisions and why the agent performs wellor badly. Moreover, our method is a flexible explanatory module that can beapplied to most vision-based RL agents. Overall, our method provides valuableinsight into interpretable vision-based RL.

Quick Read (beta)

loading the full paper ...