Large Scale Structure of Neural Network Loss Landscapes

Abstract

There are many surprising and perhaps counter-intuitive properties ofoptimization of deep neural networks. We propose and experimentally verify aunified phenomenological model of the loss landscape that incorporates many ofthem. High dimensionality plays a key role in our model. Our core idea is tomodel the loss landscape as a set of high dimensional \emph{wedges} thattogether form a large-scale, inter-connected structure and towards whichoptimization is drawn. We first show that hyperparameter choices such aslearning rate, network width and $L_2$ regularization, affect the pathoptimizer takes through the landscape in a similar ways, influencing the largescale curvature of the regions the optimizer explores. Finally, we predict anddemonstrate new counter-intuitive properties of the loss-landscape. We show anexistence of low loss subspaces connecting a set (not only a pair) ofsolutions, and verify it experimentally. Finally, we analyze recently popularensembling techniques for deep networks in the light of our model.

Quick Read (beta)

loading the full paper ...