Transparent Trade-offs between Properties of Explanations

Abstract

When explaining black-box machine learning models, it's often important forexplanations to have certain desirable properties. Most existing methods`encourage' desirable properties in their construction of explanations. In thiswork, we demonstrate that these forms of encouragement do not consistentlycreate explanations with the properties that are supposedly being targeted.Moreover, they do not allow for any control over which properties areprioritized when different properties are at odds with each other. We proposeto directly optimize explanations for desired properties. Our direct approachnot only produces explanations with optimal properties more consistently butalso empowers users to control trade-offs between different properties,allowing them to create explanations with exactly what is needed for aparticular task.

Quick Read (beta)

loading the full paper ...