A Non-Asymptotic Analysis of Network Independence for Distributed Stochastic Gradient Descent

Abstract

This paper is concerned with minimizing the average of $n$ cost functionsover a network, in which agents may communicate and exchange information withtheir peers in the network. Specifically, we consider the setting where onlynoisy gradient information is available. To solve the problem, we study thestandard distributed stochastic gradient descent (DSGD) method and perform anon-asymptotic convergence analysis. For strongly convex and smooth objectivefunctions, we not only show that DSGD asymptotically achieves the optimalnetwork independent convergence rate compared to centralized stochasticgradient descent (SGD), but also explicitly identify the non-asymptoticconvergence rate as a function of characteristics of the objective functionsand the network. Furthermore, we derive the time needed for DSGD to approachthe asymptotic convergence rate, which behaves as$K_T=\mathcal{O}(\frac{n}{(1-\rho_w)^2})$, where $(1-\rho_w)$ denotes thespectral gap of the mixing matrix of communicating agents.

Quick Read (beta)

loading the full paper ...