Learning ReLU Networks via Alternating Minimization

  • 2018-06-20 17:43:38
  • Gauri Jagatap, Chinmay Hegde
  • 3

Abstract

We propose and analyze a new family of algorithms for training neuralnetworks with ReLU activations. Our algorithms are based on the technique ofalternating minimization: estimating the activation patterns of each ReLU forall given samples, interleaved with weight updates via a least-squares step. Weconsider three different cases of this model: (i) a single ReLU; (ii) 1-hiddenlayer networks with $k$ hidden ReLUs (iii) 2-hidden layer networks. We showthat under standard distributional assumptions on the input data, our algorithmprovably recovers the true "ground truth" parameters in a linearly convergentfashion; furthermore, our method exhibits requires only $O(d)$ samples for thesingle ReLU case and $\widetilde{O}(dk^2)$ samples in the 1-hidden layer case.We also extend this framework to deeper networks, and empirically demonstrateits convergence to a global minimum.

 

Quick Read (beta)

loading the full paper ...