Probabilistic Stability Guarantees for Feature Attributions

  • 2025-04-18 17:39:08
  • Helen Jin, Anton Xue, Weiqiu You, Surbhi Goel, Eric Wong
  • 0

Abstract

Stability guarantees are an emerging tool for evaluating featureattributions, but existing certification methods rely on smoothed classifiersand often yield conservative guarantees. To address these limitations, weintroduce soft stability and propose a simple, model-agnostic, andsample-efficient stability certification algorithm (SCA) that providesnon-trivial and interpretable guarantees for any attribution. Moreover, we showthat mild smoothing enables a graceful tradeoff between accuracy and stability,in contrast to prior certification methods that require a more aggressivecompromise. Using Boolean function analysis, we give a novel characterizationof stability under smoothing. We evaluate SCA on vision and language tasks, anddemonstrate the effectiveness of soft stability in measuring the robustness ofexplanation methods.

 

Quick Read (beta)

loading the full paper ...