Elimination of All Bad Local Minima in Deep Learning

Abstract

In this paper, we theoretically prove that we can eliminate all suboptimallocal minima by adding one neuron per output unit to any deep neural network,for multi-class classification, binary classification, and regression with anarbitrary loss function. At every local minimum of any deep neural network withadded neurons, the set of parameters of the original neural network (withoutadded neurons) is guaranteed to be a global minimum of the original neuralnetwork. The effects of the added neurons are proven to automatically vanish atevery local minimum. Unlike many related results in the literature, ourtheoretical results are directly applicable to common deep learning tasksbecause the results only rely on the assumptions that automatically hold in thecommon tasks. Moreover, we discuss several limitations in eliminating thesuboptimal local minima in this manner by providing additional theoreticalresults and several examples.

Quick Read (beta)

loading the full paper ...