Explainable Deep Reinforcement Learning: State of the Art and Challenges

Abstract

Interpretability, explainability and transparency are key issues tointroducing Artificial Intelligence methods in many critical domains: This isimportant due to ethical concerns and trust issues strongly connected toreliability, robustness, auditability and fairness, and has importantconsequences towards keeping the human in the loop in high levels ofautomation, especially in critical cases for decision making, where both (humanand the machine) play important roles. While the research community has givenmuch attention to explainability of closed (or black) prediction boxes, thereare tremendous needs for explainability of closed-box methods that supportagents to act autonomously in the real world. Reinforcement learning methods,and especially their deep versions, are such closed-box methods. In thisarticle we aim to provide a review of state of the art methods for explainabledeep reinforcement learning methods, taking also into account the needs ofhuman operators - i.e., of those that take the actual and critical decisions insolving real-world problems. We provide a formal specification of the deepreinforcement learning explainability problems, and we identify the necessarycomponents of a general explainable reinforcement learning framework. Based onthese, we provide a comprehensive review of state of the art methods,categorizing them in classes according to the paradigm they follow, theinterpretable models they use, and the surface representation of explanationsprovided. The article concludes identifying open questions and importantchallenges.

Quick Read (beta)

loading the full paper ...