Learning One-hidden-layer ReLU Networks via Gradient Descent

Abstract

We study the problem of learning one-hidden-layer neural networks withRectified Linear Unit (ReLU) activation function, where the inputs are sampledfrom standard Gaussian distribution and the outputs are generated from a noisyteacher network. We analyze the performance of gradient descent for trainingsuch kind of neural networks based on empirical risk minimization, and providealgorithm-dependent guarantees. In particular, we prove that tensorinitialization followed by gradient descent can converge to the ground-truthparameters at a linear rate up to some statistical error. To the best of ourknowledge, this is the first work characterizing the recovery guarantee forpractical learning of one-hidden-layer ReLU networks with multiple neurons.Numerical experiments verify our theoretical findings.

Quick Read (beta)

loading the full paper ...