Efficient estimation of the ANOVA mean dimension, with an application to neural net classification

Abstract

The mean dimension of a black box function of $d$ variables is a convenientway to summarize the extent to which it is dominated by high or low orderinteractions. It is expressed in terms of $2^d-1$ variance components but itcan be written as the sum of $d$ Sobol' indices that can be estimated by leaveone out methods. We compare the variance of these leave one out methods: aGibbs sampler called winding stairs, a radial sampler that changes eachvariable one at a time from a baseline, and a naive sampler that never reusesfunction evaluations and so costs about double the other methods. For anadditive function the radial and winding stairs are most efficient. For amultiplicative function the naive method can easily be most efficient if thefactors have high kurtosis. As an illustration we consider the mean dimensionof a neural network classifier of digits from the MNIST data set. Theclassifier is a function of $784$ pixels. For that problem, winding stairs isthe best algorithm. We find that inputs to the final softmax layer have meandimensions ranging from $1.35$ to $2.0$.

Quick Read (beta)

loading the full paper ...