Experiential Explanations for Reinforcement Learning

Abstract

Reinforcement Learning approaches are becoming increasingly popular invarious key disciplines, including robotics and healthcare. However, many ofthese systems are complex and non-interpretable, making it challenging fornon-AI experts to understand or intervene in their decisions. One of thechallenges of explaining RL agent behavior is that, when learning to predictfuture expected rewards, agents discard contextual information about theirexperiences when training in an environment and rely solely on expectedutility. We propose a technique, Experiential Explanations, for generatinglocal counterfactual explanations that can answer users' why-not questions byexplaining the qualitative effects of the various environmental rewards on theagent's behavior. We achieve this by training additional models alongside thepolicy model. These models, called influence predictors, capture how differentreward sources influence the agent's policy, thus restoring lost contextualinformation about how the policy reflects the environment. To generateexplanations, we use these influence predictors in addition to the policy modelto contrast between the agent's intended behavior trajectory and acounterfactual trajectory suggested by the user. A human evaluation studyrevealed that participants had a higher probability of correctly predicting theagent's subsequent action when presented with Experiential Explanations thanother explanation types. Moreover, compared to other baseline types,participants found Experiential Explanations more useful and more oftenutilized the kinds of information presented in them when reasoning about theagent's actions. Experiential Explanations also outperformed other explanationsin understandability, satisfaction, amount of details, completeness,usefulness, and accuracy.

Quick Read (beta)

loading the full paper ...