Counterfactual Explanations for Continuous Action Reinforcement Learning

Abstract

Reinforcement Learning (RL) has shown great promise in domains likehealthcare and robotics but often struggles with adoption due to its lack ofinterpretability. Counterfactual explanations, which address "what if"scenarios, provide a promising avenue for understanding RL decisions but remainunderexplored for continuous action spaces. We propose a novel approach forgenerating counterfactual explanations in continuous action RL by computingalternative action sequences that improve outcomes while minimizing deviationsfrom the original sequence. Our approach leverages a distance metric forcontinuous actions and accounts for constraints such as adhering to predefinedpolicies in specific states. Evaluations in two RL domains, Diabetes Controland Lunar Lander, demonstrate the effectiveness, efficiency, and generalizationof our approach, enabling more interpretable and trustworthy RL applications.

Quick Read (beta)

loading the full paper ...