Reconciling modern machine learning and the bias-variance trade-off

Abstract

The question of generalization in machine learning---how algorithms are ableto learn predictors from a training sample to make accurate predictionsout-of-sample---is revisited in light of the recent breakthroughs in modernmachine learning technology. The classical approach to understanding generalization is based onbias-variance trade-offs, where model complexity is carefully calibrated sothat the fit on the training sample reflects performance out-of-sample. However, it is now common practice to fit highly complex models like deepneural networks to data with (nearly) zero training error, and yet theseinterpolating predictors are observed to have good out-of-sample accuracy evenfor noisy data. How can the classical understanding of generalization be reconciled withthese observations from modern machine learning practice? In this paper, we bridge the two regimes by exhibiting a new "double descent"risk curve that extends the traditional U-shaped bias-variance curve beyond thepoint of interpolation. Specifically, the curve shows that as soon as the model complexity is highenough to achieve interpolation on the training sample---a point that we callthe "interpolation threshold"---the risk of suitably chosen interpolatingpredictors from these models can, in fact, be decreasing as the modelcomplexity increases, often below the risk achieved using non-interpolatingmodels. The double descent risk curve is demonstrated for a broad range of models,including neural networks and random forests, and a mechanism for producingthis behavior is posited.

Quick Read (beta)

loading the full paper ...