How Many Factors Influence Minima in SGD?

  • 2020-09-24 17:58:46
  • Victor Luo, Yazhen Wang
  • 9

Abstract

Stochastic gradient descent (SGD) is often applied to train Deep NeuralNetworks (DNNs), and research efforts have been devoted to investigate theconvergent dynamics of SGD and minima found by SGD. The influencing factorsidentified in the literature include learning rate, batch size, Hessian, andgradient covariance, and stochastic differential equations are used to modelSGD and establish the relationships among these factors for characterizingminima found by SGD. It has been found that the ratio of batch size to learningrate is a main factor in highlighting the underlying SGD dynamics; however, theinfluence of other important factors such as the Hessian and gradientcovariance is not entirely agreed upon. This paper describes the factors andrelationships in the recent literature and presents numerical findings on therelationships. In particular, it confirms the four-factor and generalrelationship results obtained in Wang (2019), while the three-factor andassociated relationship results found in Jastrz\c{e}bski et al. (2018) may nothold beyond the considered special case.

 

Quick Read (beta)

loading the full paper ...