Convergence of the ADAM algorithm from a Dynamical System Viewpoint

Abstract

Adam is a popular variant of the stochastic gradient descent for finding alocal minimizer of a function. The objective function is unknown but a randomestimate of the current gradient vector is observed at each round of thealgorithm. This paper investigates the dynamical behavior of Adam when theobjective function is non-convex and differentiable. We introduce acontinuous-time version of Adam, under the form of a non-autonomous ordinarydifferential equation (ODE). The existence and the uniqueness of the solutionare established, as well as the convergence of the solution towards thestationary points of the objective function. It is also proved that thecontinuous-time system is a relevant approximation of the Adam iterates, in thesense that the interpolated Adam process converges weakly to the solution tothe ODE.

Quick Read (beta)

loading the full paper ...