How to trap a gradient flow

  • 2020-01-09 13:30:18
  • S├ębastien Bubeck, Dan Mikulincer
  • 24


We consider the problem of finding an $\varepsilon$-approximate stationarypoint of a smooth function on a compact domain of $\mathbb{R}^d$. In contrastwith dimension-free approaches such as gradient descent, we focus here on thecase where $d$ is finite, and potentially small. This viewpoint was explored in1993 by Vavasis, who proposed an algorithm which, for any fixed finitedimension $d$, improves upon the $O(1/\varepsilon^2)$ oracle complexity ofgradient descent. For example for $d=2$, Vavasis' approach obtains thecomplexity $O(1/\varepsilon)$. Moreover for $d=2$ he also proved a lower boundof $\Omega(1/\sqrt{\varepsilon})$ for deterministic algorithms (we extend thisresult to randomized algorithms). Our main contribution is an algorithm, which we call gradient flow trapping(GFT), and the analysis of its oracle complexity. In dimension $d=2$, GFTcloses the gap with Vavasis' lower bound (up to a logarithmic factor), as weshow that it has complexity$O\left(\sqrt{\frac{\log(1/\varepsilon)}{\varepsilon}}\right)$. In dimension$d=3$, we show a complexity of$O\left(\frac{\log(1/\varepsilon)}{\varepsilon}\right)$, improving uponVavasis' $O\left(1 / \varepsilon^{1.2} \right)$. In higher dimensions, GFT hasthe remarkable property of being a logarithmic parallel depth strategy, instark contrast with the polynomial depth of gradient descent or Vavasis'algorithm. In this higher dimensional regime, the total work of GFT improvesquadratically upon the only other known polylogarithmic depth strategy for thisproblem, namely naive grid search.


Quick Read (beta)

loading the full paper ...