On the Robustness of Global Feature Effect Explanations

Abstract

We study the robustness of global post-hoc explanations for predictive modelstrained on tabular data. Effects of predictor features in black-box supervisedlearning are an essential diagnostic tool for model debugging and scientificdiscovery in applied sciences. However, how vulnerable they are to data andmodel perturbations remains an open research question. We introduce severaltheoretical bounds for evaluating the robustness of partial dependence plotsand accumulated local effects. Our experimental results with synthetic andreal-world datasets quantify the gap between the best and worst-case scenariosof (mis)interpreting machine learning predictions globally.

Quick Read (beta)

loading the full paper ...