Experiential Explanations for Reinforcement Learning

Abstract

Reinforcement Learning (RL) systems can be complex and non-interpretable,making it challenging for non-AI experts to understand or intervene in theirdecisions. This is due, in part, to the sequential nature of RL in whichactions are chosen because of future rewards. However, RL agents discard thequalitative features of their training, making it hard to recoveruser-understandable information for "why" an action is chosen. Proposedsentence chunking: We propose a technique Experiential Explanations to generatecounterfactual explanations by training influence predictors alongside the RLpolicy. Influence predictors are models that learn how sources of reward affectthe agent in different states, thus restoring information about how the policyreflects the environment. A human evaluation study revealed that participantspresented with experiential explanations were better able to correctly guesswhat an agent would do than those presented with other standard types ofexplanations. Participants also found experiential explanations to be moreunderstandable, satisfying, complete, useful, and accurate. The qualitativeanalysis provides insights into the factors of experiential explanations thatfind most useful.

Quick Read (beta)

loading the full paper ...