Abstract
As opaque black-box predictive models become more prevalent, the need todevelop interpretations for these models is of great interest. The concept ofvariable importance and Shapley values are interpretability measures thatapplies to any predictive model and assesses how much a variable or set ofvariables improves prediction performance. When the number of variables islarge, estimating variable importance presents a significant computationalchallenge because re-training neural networks or other black-box algorithmsrequires significant additional computation. In this paper, we address thischallenge for algorithms using gradient descent and gradient boosting (e.g.neural networks, gradient-boosted decision trees). By using the ideas of earlystopping of gradient-based methods in combination with warm-start using thedropout method, we develop a scalable method to estimate variable importancefor any algorithm that can be expressed as an iterative kernel update equation.Importantly, we provide theoretical guarantees by using the theory for earlystopping of kernel-based methods for neural networks with sufficiently large(but not necessarily infinite) width and gradient-boosting decision trees thatuse symmetric trees as a weaker learner. We also demonstrate the efficacy ofour methods through simulations and a real data example which illustrates thecomputational benefit of early stopping rather than fully re-training the modelas well as the increased accuracy of our approach.