An Improved Analysis of Training Over-parameterized Deep Neural Networks

  • 2019-06-11 16:40:46
  • Difan Zou, Quanquan Gu
  • 4

Abstract

A recent line of research has shown that gradient-based algorithms withrandom initialization can converge to the global minima of the training lossfor over-parameterized (i.e., sufficiently wide) deep neural networks. However,the condition on the width of the neural network to ensure the globalconvergence is very stringent, which is often a high-degree polynomial in thetraining sample size $n$ (e.g., $O(n^{24})$). In this paper, we provide animproved analysis of the global convergence of (stochastic) gradient descentfor training deep neural networks, which only requires a milderover-parameterization condition than previous work in terms of the trainingsample size and other problem-dependent parameters. The main technicalcontributions of our analysis include (a) a tighter gradient lower bound thatleads to a faster convergence of the algorithm, and (b) a sharpercharacterization of the trajectory length of the algorithm. By specializing ourresult to two-layer (i.e., one-hidden-layer) neural networks, it also providesa milder over-parameterization condition than the best-known result in priorwork.

 

Quick Read (beta)

loading the full paper ...