Abstract
A key reason for the lack of reliability of deep neural networks in the realworld is their heavy reliance on {\it spurious} input features that arecausally unrelated to the true label. Focusing on image classifications, wedefine causal attributes as the set of visual features that are always a partof the object while spurious attributes are the ones that are likely to {\itco-occur} with the object but not a part of it (e.g., attribute ``fingers" forclass ``band aid"). Traditional methods for discovering spurious featureseither require extensive human annotations (thus, not scalable), or are usefulon specific models. In this work, we introduce a {\it scalable} framework todiscover a subset of spurious and causal visual attributes used in inferencesof a general model and localize them on a large number of images with minimalhuman supervision. Our methodology is based on this key idea: to identifyspurious or causal \textit{visual attributes} used in model predictions, weidentify spurious or causal \textit{neural features} (penultimate layer neuronsof a robust model) via limited human supervision (e.g., using top 5 activatingimages per feature). We then show that these neural feature annotations {\itgeneralize} extremely well to many more images {\it without} any humansupervision. We use the activation maps for these neural features as the softmasks to highlight spurious or causal visual attributes. Using thismethodology, we introduce the {\it Causal Imagenet} dataset containing causaland spurious masks for a large set of samples from Imagenet. We assess theperformance of several popular Imagenet models and show that they rely heavilyon various spurious features in their predictions.