Causal ImageNet: How to discover spurious features in Deep Learning?

Abstract

A key reason for the lack of reliability of deep neural networks in the realworld is their heavy reliance on {\it spurious} input features that arecausally unrelated to the true label. Focusing on image classifications, wedefine causal attributes as the set of visual features that are always a partof the object while spurious attributes are the ones that are likely to {\itco-occur} with the object but not a part of it (e.g., attribute ``fingers" forclass ``band aid"). Traditional methods for discovering spurious featureseither require extensive human annotations (thus, not scalable), or are usefulon specific models. In this work, we introduce a {\it scalable} framework todiscover a subset of spurious and causal visual attributes used in inferencesof a general model and localize them on a large number of images with minimalhuman supervision. Our methodology is based on this key idea: to identifyspurious or causal \textit{visual attributes} used in model predictions, weidentify spurious or causal \textit{neural features} (penultimate layer neuronsof a robust model) via limited human supervision (e.g., using top 5 activatingimages per feature). We then show that these neural feature annotations {\itgeneralize} extremely well to many more images {\it without} any humansupervision. We use the activation maps for these neural features as the softmasks to highlight spurious or causal visual attributes. Using thismethodology, we introduce the {\it Causal Imagenet} dataset containing causaland spurious masks for a large set of samples from Imagenet. We assess theperformance of several popular Imagenet models and show that they rely heavilyon various spurious features in their predictions.

Quick Read (beta)

loading the full paper ...