Redefining Counterfactual Explanations for Reinforcement Learning: Overview, Challenges and Opportunities

Abstract

While AI algorithms have shown remarkable success in various fields, theirlack of transparency hinders their application to real-life tasks. Althoughexplanations targeted at non-experts are necessary for user trust and human-AIcollaboration, the majority of explanation methods for AI are focused ondevelopers and expert users. Counterfactual explanations are local explanationsthat offer users advice on what can be changed in the input for the output ofthe black-box model to change. Counterfactuals are user-friendly and provideactionable advice for achieving the desired output from the AI system. Whileextensively researched in supervised learning, there are few methods applyingthem to reinforcement learning (RL). In this work, we explore the reasons forthe underrepresentation of a powerful explanation method in RL. We start byreviewing the current work in counterfactual explanations in supervisedlearning. Additionally, we explore the differences between counterfactualexplanations in supervised learning and RL and identify the main challengesthat prevent the adoption of methods from supervised in reinforcement learning.Finally, we redefine counterfactuals for RL and propose research directions forimplementing counterfactuals in RL.

Quick Read (beta)

loading the full paper ...