Evaluating Feature Importance Estimates

Abstract

Estimating the influence of a given feature to a model prediction ischallenging. We introduce ROAR, RemOve And Retrain, a benchmark to evaluate theaccuracy of interpretability methods that estimate input feature importance indeep neural networks. We remove a fraction of input features deemed to be mostimportant according to each estimator and measure the change to the modelaccuracy upon retraining. The most accurate estimator will identify inputs asimportant whose removal causes the most damage to model performance relative toall other estimators. This evaluation produces thought-provoking results -- wefind that several estimators are less accurate than a random assignment offeature importance. However, averaging a set of squared noisy estimators (avariant of a technique proposed by Smilkov et al. (2017)), leads to significantgains in accuracy for each method considered and far outperforms such a randomguess.

Quick Read (beta)

loading the full paper ...