Comparing Dynamics: Deep Neural Networks versus Glassy Systems

  • 2018-03-19 14:59:01
  • M. Baity-Jesi, L. Sagun, M. Geiger, S. Spigler, G. Ben Arous, C. Cammarota, Y. LeCun, M. Wyart, G. Biroli
  • 16

Abstract

We analyze numerically the training dynamics of deep neural networks (DNN) byusing methods developed in statistical physics of glassy systems. The two mainissues we address are the complexity of the loss-landscape and of the dynamicswithin it, and to what extent DNNs share similarities with glassy systems. Ourfindings, obtained for different architectures and datasets, suggest thatduring the training process the dynamics slows down because of an increasinglylarge number of flat directions. At large times, when the loss is approachingzero, the system diffuses at the bottom of the landscape. Despite somesimilarities with the dynamics of mean-field glassy systems, in particular, theabsence of barrier crossing, we find distinctive dynamical behaviors in the twocases, showing that the statistical properties of the corresponding loss andenergy landscapes are different. In contrast, when the network isunder-parametrized we observe a typical glassy behavior, thus suggesting theexistence of different phases depending on whether the network isunder-parametrized or over-parametrized.

 

Quick Read (beta)

loading the full paper ...