A near-optimal stochastic gradient method for decentralized non-convex finite-sum optimization

Abstract

This paper describes a $near$-$optimal$ stochastic first-order gradientmethod for decentralized finite-sum minimization of smooth non-convexfunctions. Specifically, we propose GT-SARAH that employs a local SARAH-typevariance reduction and global gradient tracking to address the stochastic anddecentralized nature of the problem. Considering a total number of $N$ costfunctions, equally divided over a directed network of $n$ nodes, we show thatGT-SARAH finds an $\epsilon$-accurate first-order stationary point in${\mathcal{O}(N^{1/2}\epsilon^{-1})}$ gradient computations across all nodes,independent of the network topology, when${n\leq\mathcal{O}(N^{1/2}(1-\lambda)^{3})}$, where ${(1-\lambda)}$ is thespectral gap of the network weight matrix. In this regime, GT-SARAH is thus, tothe best our knowledge, the first decentralized method that achieves thealgorithmic lower bound for this class of problems. Moreover, GT-SARAH achievesa $non$-$asymptotic$ $linear$ $speedup$, in that, the total number of gradientcomputations at each node is reduced by a factor of $1/n$ compared to thenear-optimal algorithms for this problem class that process all data at asingle node. We also establish the convergence rate of GT-SARAH in otherregimes, in terms of the relative sizes of the number of nodes $n$, totalnumber of functions $N$, and the network spectral gap $(1-\lambda)$. Overinfinite time horizon, we establish the almost sure and mean-squaredconvergence of GT-SARAH to a first-order stationary point.

Quick Read (beta)

loading the full paper ...