Generative Counterfactual Introspection for Explainable Deep Learning

Abstract

In this work, we propose an introspection technique for deep neural networksthat relies on a generative model to instigate salient editing of the inputimage for model interpretation. Such modification provides the fundamentalinterventional operation that allows us to obtain answers to counterfactualinquiries, i.e., what meaningful change can be made to the input image in orderto alter the prediction. We demonstrate how to reveal interesting properties ofthe given classifiers by utilizing the proposed introspection approach on boththe MNIST and the CelebA dataset.

Quick Read (beta)

loading the full paper ...