Learning One-hidden-layer ReLU Networks via Gradient Descent

  • 2018-06-20 15:52:43
  • Xiao Zhang, Yaodong Yu, Lingxiao Wang, Quanquan Gu
  • 1

Abstract

We study the problem of learning one-hidden-layer neural networks withRectified Linear Unit (ReLU) activation function, where the inputs are sampledfrom standard Gaussian distribution and the outputs are generated from a noisyteacher network. We analyze the performance of gradient descent for trainingsuch kind of neural networks based on empirical risk minimization, and providealgorithm-dependent guarantees. In particular, we prove that tensorinitialization followed by gradient descent can converge to the ground-truthparameters at a linear rate up to some statistical error. To the best of ourknowledge, this is the first work characterizing the recovery guarantee forpractical learning of one-hidden-layer ReLU networks with multiple neurons.Numerical experiments verify our theoretical findings.

 

Quick Read (beta)

loading the full paper ...