The Deep Bootstrap: Good Online Learners are Good Offline Generalizers

  • 2020-10-16 03:07:49
  • Preetum Nakkiran, Behnam Neyshabur, Hanie Sedghi
  • 79

Abstract

We propose a new framework for reasoning about generalization in deeplearning. The core idea is to couple the Real World, where optimizers takestochastic gradient steps on the empirical loss, to an Ideal World, whereoptimizers take steps on the population loss. This leads to an alternatedecomposition of test error into: (1) the Ideal World test error plus (2) thegap between the two worlds. If the gap (2) is universally small, this reducesthe problem of generalization in offline learning to the problem ofoptimization in online learning. We then give empirical evidence that this gapbetween worlds can be small in realistic deep learning settings, in particularsupervised image classification. For example, CNNs generalize better than MLPson image distributions in the Real World, but this is "because" they optimizefaster on the population loss in the Ideal World. This suggests our frameworkis a useful tool for understanding generalization in deep learning, and lays afoundation for future research in the area.

 

Quick Read (beta)

loading the full paper ...