Pay Attention to What and Where? Interpretable Feature Extractor in Vision-based Deep Reinforcement Learning

Abstract

Current approaches in Explainable Deep Reinforcement Learning havelimitations in which the attention mask has a displacement with the objects invisual input. This work addresses a spatial problem within traditionalConvolutional Neural Networks (CNNs). We propose the Interpretable FeatureExtractor (IFE) architecture, aimed at generating an accurate attention mask toillustrate both "what" and "where" the agent concentrates on in the spatialdomain. Our design incorporates a Human-Understandable Encoding module togenerate a fully interpretable attention mask, followed by an Agent-FriendlyEncoding module to enhance the agent's learning efficiency. These twocomponents together form the Interpretable Feature Extractor for vision-baseddeep reinforcement learning to enable the model's interpretability. Theresulting attention mask is consistent, highly understandable by humans,accurate in spatial dimension, and effectively highlights important objects orlocations in visual input. The Interpretable Feature Extractor is integratedinto the Fast and Data-efficient Rainbow framework, and evaluated on 57 ATARIgames to show the effectiveness of the proposed approach on SpatialPreservation, Interpretability, and Data-efficiency. Finally, we showcase theversatility of our approach by incorporating the IFE into the AsynchronousAdvantage Actor-Critic Model.

Quick Read (beta)

loading the full paper ...