On the importance of single directions for generalization

Abstract

Despite their ability to memorize large datasets, deep neural networks oftenachieve good generalization performance. However, the differences between thelearned solutions of networks which generalize and those which do not remainunclear. Additionally, the tuning properties of single directions (defined asthe activation of a single unit or some linear combination of units in responseto some input) have been highlighted, but their importance has not beenevaluated. Here, we connect these lines of inquiry to demonstrate that anetwork's reliance on single directions is a good predictor of itsgeneralization performance, across networks trained on datasets with differentfractions of corrupted labels, across ensembles of networks trained on datasetswith unmodified labels, across different hyperparameters, and over the courseof training. While dropout only regularizes this quantity up to a point, batchnormalization implicitly discourages single direction reliance, in part bydecreasing the class selectivity of individual units. Finally, we find thatclass selectivity is a poor predictor of task importance, suggesting not onlythat networks which generalize well minimize their dependence on individualunits by reducing their selectivity, but also that individually selective unitsmay not be necessary for strong network performance.

Quick Read (beta)

loading the full paper ...