(When) Are Contrastive Explanations of Reinforcement Learning Helpful?

Abstract

Global explanations of a reinforcement learning (RL) agent's expectedbehavior can make it safer to deploy. However, such explanations are oftendifficult to understand because of the complicated nature of many RL policies.Effective human explanations are often contrastive, referencing a knowncontrast (policy) to reduce redundancy. At the same time, these explanationsalso require the additional effort of referencing that contrast when evaluatingan explanation. We conduct a user study to understand whether and whencontrastive explanations might be preferable to complete explanations that donot require referencing a contrast. We find that complete explanations aregenerally more effective when they are the same size or smaller than acontrastive explanation of the same policy, and no worse when they are larger.This suggests that contrastive explanations are not sufficient to solve theproblem of effectively explaining reinforcement learning policies, and requireadditional careful study for use in this context.

Quick Read (beta)

loading the full paper ...