Community Detection on Model Explanation Graphs for Explainable AI

Abstract

Feature-attribution methods (e.g., SHAP, LIME) explain individual predictionsbut often miss higher-order structure: sets of features that act in concert. Wepropose Modules of Influence (MoI), a framework that (i) constructs a modelexplanation graph from per-instance attributions, (ii) applies communitydetection to find feature modules that jointly affect predictions, and (iii)quantifies how these modules relate to bias, redundancy, and causalitypatterns. Across synthetic and real datasets, MoI uncovers correlated featuregroups, improves model debugging via module-level ablations, and localizes biasexposure to specific modules. We release stability and synergy metrics, areference implementation, and evaluation protocols to benchmark modulediscovery in XAI.

Quick Read (beta)

loading the full paper ...