Characterizing the Interpretability of Attention Maps in Digital Pathology

Abstract

Interpreting machine learning model decisions is crucial for high-riskapplications like healthcare. In digital pathology, large whole slide images(WSIs) are decomposed into smaller tiles and tile-derived features areprocessed by attention-based multiple instance learning (ABMIL) models topredict WSI-level labels. These networks generate tile-specific attentionweights, which can be visualized as attention maps for interpretability.However, a standardized evaluation framework for these maps is lacking,questioning their reliability and ability to detect spurious correlations thatcan mislead models. We herein propose a framework to assess the ability ofattention networks to attend to relevant features in digital pathology bycreating artificial model confounders and using dedicated interpretabilitymetrics. Models are trained and evaluated on data with tile modificationscorrelated with WSI labels, enabling the analysis of model sensitivity toartificial confounders and the accuracy of attention maps in highlighting them.Confounders are introduced either through synthetic tile modifications orthrough tile ablations based on their specific image-based features, with thelatter being used to assess more clinically relevant scenarios. We also analyzethe impact of varying confounder quantities at both the tile and WSI levels.Our results show that ABMIL models perform as desired within our framework.While attention maps generally highlight relevant regions, their robustness isaffected by the type and number of confounders. Our versatile framework has thepotential to be used in the evaluation of various methods and the explorationof image-based features driving model predictions, which could aid in biomarkerdiscovery.

Quick Read (beta)

loading the full paper ...