Abstract
Understanding the inductive bias and generalization properties of largeoverparametrized machine learning models requires to characterize the dynamicsof the training algorithm. We study the learning dynamics of large two-layerneural networks via dynamical mean field theory, a well established techniqueof non-equilibrium statistical physics. We show that, for large network width$m$, and large number of samples per input dimension $n/d$, the trainingdynamics exhibits a separation of timescales which implies: $(i)$~The emergenceof a slow time scale associated with the growth in Gaussian/Rademachercomplexity of the network; $(ii)$~Inductive bias towards small complexity ifthe initialization has small enough complexity; $(iii)$~A dynamical decouplingbetween feature learning and overfitting regimes; $(iv)$~A non-monotonebehavior of the test error, associated `feature unlearning' regime at largetimes.