Efficient Algorithms for Learning Depth-2 Neural Networks with General ReLU Activations

Abstract

We present polynomial time and sample efficient algorithms for learning anunknown depth-2 feedforward neural network with general ReLU activations, undermild non-degeneracy assumptions. In particular, we consider learning an unknownnetwork of the form $f(x) = {a}^{\mathsf{T}}\sigma({W}^\mathsf{T}x+b)$, where$x$ is drawn from the Gaussian distribution, and $\sigma(t) := \max(t,0)$ isthe ReLU activation. Prior works for learning networks with ReLU activationsassume that the bias $b$ is zero. In order to deal with the presence of thebias terms, our proposed algorithm consists of robustly decomposing multiplehigher order tensors arising from the Hermite expansion of the function $f(x)$.Using these ideas we also establish identifiability of the network parametersunder minimal assumptions.

Quick Read (beta)

loading the full paper ...