Abstract
In this article we present a geometric framework to analyze convergence ofgradient descent trajectories in the context of neural networks. In the case oflinear networks of an arbitrary number of hidden layers, we characterizeappropriate quantities which are conserved along the gradient descent system(GDS). We use them to prove boundedness of every trajectory of the GDS, whichimplies convergence to a critical point. We further focus on the local behaviorin the neighborhood of each critical points and perform a study on theassociated basin of attractions so as to measure the "possibility" ofconverging to saddle points and local minima.
Quick Read (beta)
loading the full paper ...