Learning-Rate-Free Learning by D-Adaptation

  • 2023-01-18 19:00:50
  • Aaron Defazio, Konstantin Mishchenko
  • 59


The speed of gradient descent for convex Lipschitz functions is highlydependent on the choice of learning rate. Setting the learning rate to achievethe optimal convergence rate requires knowing the distance D from the initialpoint to the solution set. In this work, we describe a single-loop method, withno back-tracking or line searches, which does not require knowledge of $D$ yetasymptotically achieves the optimal rate of convergence for the complexityclass of convex Lipschitz functions. Our approach is the first parameter-freemethod for this class without additional multiplicative log factors in theconvergence rate. We present extensive experiments for SGD and Adam variants ofour method, where the method automatically matches hand-tuned learning ratesacross more than a dozen diverse machine learning problems, includinglarge-scale vision and language problems. Our method is practical, efficientand requires no additional function value or gradient evaluations each step. Anopen-source implementation is available(https://github.com/facebookresearch/dadaptation).


Quick Read (beta)

loading the full paper ...