Armijo Line-search Can Make (Stochastic) Gradient Descent Provably Faster

Abstract

Armijo line-search (Armijo-LS) is a standard method to set the step-size forgradient descent (GD). For smooth functions, Armijo-LS alleviates the need toknow the global smoothness constant L and adapts to the ``local'' smoothness,enabling GD to converge faster. Existing theoretical analyses show that GD withArmijo-LS (GD-LS) can result in constant factor improvements over GD with a 1/Lstep-size (denoted as GD(1/L)). We strengthen these results and show that ifthe objective function satisfies a certain non-uniform smoothness condition,GD-LS can result in a faster convergence rate than GD(1/L). In particular, weprove that for convex objectives corresponding to logistic regression andmulti-class classification, GD-LS can converge to the optimum at a linear rate,and hence improves over the sublinear convergence of GD(1/L). Furthermore, fornon-convex objectives satisfying gradient domination (e.g., those correspondingto the softmax policy gradient in RL or generalized linear models with alogistic link function), GD-LS can match the fast convergence of algorithmstailored for these specific settings. Finally, we prove that under theinterpolation assumption, for convex losses, stochastic GD with a stochasticline-search can match the fast convergence of GD-LS

Quick Read (beta)

loading the full paper ...