Abstract
We consider the problem of learning an unknown ReLU network with respect toGaussian inputs and obtain the first nontrivial results for networks of depthmore than two. We give an algorithm whose running time is a fixed polynomial inthe ambient dimension and some (exponentially large) function of only thenetwork's parameters. Our bounds depend on the number of hidden units, depth, spectral norm of theweight matrices, and Lipschitz constant of the overall network (we show thatsome dependence on the Lipschitz constant is necessary). We also give a boundthat is doubly exponential in the size of the network but is independent ofspectral norm. These results provably cannot be obtained using gradient-basedmethods and give the first example of a class of efficiently learnable neuralnetworks that gradient descent will fail to learn. In contrast, prior work for learning networks of depth three or higherrequires exponential time in the ambient dimension, even when the aboveparameters are bounded by a constant. Additionally, all prior work for thedepth-two case requires well-conditioned weights and/or positive coefficientsto obtain efficient run-times. Our algorithm does not require theseassumptions. Our main technical tool is a type of filtered PCA that can be used toiteratively recover an approximate basis for the subspace spanned by the hiddenunits in the first layer. Our analysis leverages new structural results onlattice polynomials from tropical geometry.