Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations

  • 2021-10-11 17:17:20
  • Jiayao Zhang, Hua Wang, Weijie J. Su
  • 18

Abstract

Understanding the training dynamics of deep learning models is perhaps anecessary step toward demystifying the effectiveness of these models. Inparticular, how do data from different classes gradually become separable intheir feature spaces when training neural networks using stochastic gradientdescent? In this study, we model the evolution of features during deep learningtraining using a set of stochastic differential equations (SDEs) that eachcorresponds to a training sample. As a crucial ingredient in our modelingstrategy, each SDE contains a drift term that reflects the impact ofbackpropagation at an input on the features of all samples. Our main findinguncovers a sharp phase transition phenomenon regarding the {intra-class impact:if the SDEs are locally elastic in the sense that the impact is moresignificant on samples from the same class as the input, the features of thetraining data become linearly separable, meaning vanishing training loss;otherwise, the features are not separable, regardless of how long the trainingtime is. Moreover, in the presence of local elasticity, an analysis of our SDEsshows that the emergence of a simple geometric structure called the neuralcollapse of the features. Taken together, our results shed light on thedecisive role of local elasticity in the training dynamics of neural networks.We corroborate our theoretical analysis with experiments on a synthesizeddataset of geometric shapes and CIFAR-10.

 

Quick Read (beta)

loading the full paper ...