Self-Supervised Discovering of Interpretable Features for Reinforcement Learning

Abstract

Deep reinforcement learning (RL) has recently led to many breakthroughs on arange of complex control tasks. However, the agent's decision-making process isgenerally not transparent. The lack of interpretability hinders theapplicability of RL in safety-critical scenarios. While several methods haveattempted to interpret vision-based RL, most come without detailed explanationfor the agent's behavior. In this paper, we propose a self-supervisedinterpretable framework, which can discover interpretable features to enableeasy understanding of RL agents even for non-experts. Specifically, aself-supervised interpretable network (SSINet) is employed to producefine-grained attention masks for highlighting task-relevant information, whichconstitutes most evidence for the agent's decisions. We verify and evaluate ourmethod on several Atari 2600 games as well as Duckietown, which is achallenging self-driving car simulator environment. The results show that ourmethod renders empirical evidences about how the agent makes decisions and whythe agent performs well or badly, especially when transferred to novel scenes.Overall, our method provides valuable insight into the internal decision-makingprocess of vision-based RL. In addition, our method does not use any externallabelled data, and thus demonstrates the possibility to learn high-quality maskthrough a self-supervised manner, which may shed light on new paradigms forlabel-free vision learning such as self-supervised segmentation and detection.

Quick Read (beta)

loading the full paper ...