Semifactual Explanations for Reinforcement Learning

Abstract

Reinforcement Learning (RL) is a learning paradigm in which the agent learnsfrom its environment through trial and error. Deep reinforcement learning (DRL)algorithms represent the agent's policies using neural networks, making theirdecisions difficult to interpret. Explaining the behaviour of DRL agents isnecessary to advance user trust, increase engagement, and facilitateintegration with real-life tasks. Semifactual explanations aim to explain anoutcome by providing "even if" scenarios, such as "even if the car were movingtwice as slowly, it would still have to swerve to avoid crashing". Semifactualshelp users understand the effects of different factors on the outcome andsupport the optimisation of resources. While extensively studied in psychologyand even utilised in supervised learning, semifactuals have not been used toexplain the decisions of RL systems. In this work, we develop a first approachto generating semifactual explanations for RL agents. We start by defining fiveproperties of desirable semifactual explanations in RL and then introducingSGRL-Rewind and SGRL-Advance, the first algorithms for generating semifactualexplanations in RL. We evaluate the algorithms in two standard RL environmentsand find that they generate semifactuals that are easier to reach, representthe agent's policy better, and are more diverse compared to baselines. Lastly,we conduct and analyse a user study to assess the participant's perception ofsemifactual explanations of the agent's actions.

Quick Read (beta)

loading the full paper ...