Counterfactual Generative Networks

Abstract

Neural networks are prone to learning shortcuts -- they often model simplecorrelations, ignoring more complex ones that potentially generalize better.Prior works on image classification show that instead of learning a connectionto object shape, deep classifiers tend to exploit spurious correlations withlow-level texture or the background for solving the classification task. Inthis work, we take a step towards more robust and interpretable classifiersthat explicitly expose the task's causal structure. Building on currentadvances in deep generative modeling, we propose to decompose the imagegeneration process into independent causal mechanisms that we train withoutdirect supervision. By exploiting appropriate inductive biases, thesemechanisms disentangle object shape, object texture, and background; hence,they allow for generating counterfactual images. We demonstrate the ability ofour model to generate such images on MNIST and ImageNet. Further, we show thatthe counterfactual images can improve out-of-distribution robustness with amarginal drop in performance on the original classification task, despite beingsynthetic. Lastly, our generative model can be trained efficiently on a singleGPU, exploiting common pre-trained models as inductive biases.

Quick Read (beta)

loading the full paper ...