Distal Explanations for Model-free Explainable Reinforcement Learning

Abstract

In this paper we introduce and evaluate a distal explanation model formodel-free reinforcement learning agents that can generate explanations for`why' and `why not' questions. Our starting point is the observation thatcausal models can generate opportunity chains that take the form of `A enablesB and B causes C'. Using insights from an analysis of 240 explanationsgenerated in a human-agent experiment, we define a distal explanation modelthat can analyse counterfactuals and opportunity chains using decision treesand causal models. A recurrent neural network is employed to learn opportunitychains, and decision trees are used to improve the accuracy of task predictionand the generated counterfactuals. We computationally evaluate the model in 6reinforcement learning benchmarks using different reinforcement learningalgorithms. From a study with 90 human participants, we show that our distalexplanation model results in improved outcomes over three scenarios comparedwith two baseline explanation models.

Quick Read (beta)

loading the full paper ...