Comprehensive Attribution: Inherently Explainable Vision Model with Feature Detector

Abstract

As deep vision models' popularity rapidly increases, there is a growingemphasis on explanations for model predictions. The inherently explainableattribution method aims to enhance the understanding of model behavior byidentifying the important regions in images that significantly contribute topredictions. It is achieved by cooperatively training a selector (generating anattribution map to identify important features) and a predictor (makingpredictions using the identified features). Despite many advancements, existingmethods suffer from the incompleteness problem, where discriminative featuresare masked out, and the interlocking problem, where the non-optimized selectorinitially selects noise, causing the predictor to fit on this noise andperpetuate the cycle. To address these problems, we introduce a new objectivethat discourages the presence of discriminative features in the masked-outregions thus enhancing the comprehensiveness of feature selection. Apre-trained detector is introduced to detect discriminative features in themasked-out region. If the selector selects noise instead of discriminativefeatures, the detector can observe and break the interlocking situation bypenalizing the selector. Extensive experiments show that our model makesaccurate predictions with higher accuracy than the regular black-box model, andproduces attribution maps with high feature coverage, localization ability,fidelity and robustness. Our code will be available at\href{https://github.com/Zood123/COMET}{https://github.com/Zood123/COMET}.

Quick Read (beta)

loading the full paper ...