Distributional Generalization: A New Kind of Generalization

Abstract

We introduce a new notion of generalization-- Distributional Generalization--which roughly states that outputs of a classifier at train and test time areclose *as distributions*, as opposed to close in just their average error. Forexample, if we mislabel 30% of dogs as cats in the train set of CIFAR-10, thena ResNet trained to interpolation will in fact mislabel roughly 30% of dogs ascats on the *test set* as well, while leaving other classes unaffected. Thisbehavior is not captured by classical generalization, which would only considerthe average error and not the distribution of errors over the input domain.This example is a specific instance of our much more general conjectures whichapply even on distributions where the Bayes risk is zero. Our conjecturescharacterize the form of distributional generalization that can be expected, interms of problem parameters (model architecture, training procedure, number ofsamples, data distribution). We verify the quantitative predictions of theseconjectures across a variety of domains in machine learning, including neuralnetworks, kernel machines, and decision trees. These empirical observations areindependently interesting, and form a more fine-grained characterization ofinterpolating classifiers beyond just their test error.

Quick Read (beta)

loading the full paper ...