Adaptive Variance Reduction for Stochastic Optimization under Weaker Assumptions

Abstract

This paper explores adaptive variance reduction methods for stochasticoptimization based on the STORM technique. Existing adaptive extensions ofSTORM rely on strong assumptions like bounded gradients and bounded functionvalues, or suffer an additional $\mathcal{O}(\log T)$ term in the convergencerate. To address these limitations, we introduce a novel adaptive STORM methodthat achieves an optimal convergence rate of $\mathcal{O}(T^{-1/3})$ fornon-convex functions with our newly designed learning rate strategy. Comparedwith existing approaches, our method requires weaker assumptions and attainsthe optimal convergence rate without the additional $\mathcal{O}(\log T)$ term.We also extend the proposed technique to stochastic compositional optimization,obtaining the same optimal rate of $\mathcal{O}(T^{-1/3})$. Furthermore, weinvestigate the non-convex finite-sum problem and develop another innovativeadaptive variance reduction method that achieves an optimal convergence rate of$\mathcal{O}(n^{1/4} T^{-1/2} )$, where $n$ represents the number of componentfunctions. Numerical experiments across various tasks validate theeffectiveness of our method.

Quick Read (beta)

loading the full paper ...