Counterfactual Visual Explanations

Abstract

In this work, we develop a technique to produce counterfactual visualexplanations. Given a 'query' image $I$ for which a vision system predictsclass $c$, a counterfactual visual explanation identifies how $I$ could changesuch that the system would output a different specified class $c'$. To do this,we select a 'distractor' image $I'$ that the system predicts as class $c'$ andidentify spatial regions in $I$ and $I'$ such that replacing the identifiedregion in $I$ with the identified region in $I'$ would push the system towardsclassifying $I$ as $c'$. We apply our approach to multiple image classificationdatasets generating qualitative results showcasing the interpretability anddiscriminativeness of our counterfactual explanations. To explore theeffectiveness of our explanations in teaching humans, we present machineteaching experiments for the task of fine-grained bird classification. We findthat users trained to distinguish bird species fare better when given access tocounterfactual explanations in addition to training examples.

Quick Read (beta)

loading the full paper ...