Server-Side Stepsizes and Sampling Without Replacement Provably Help in Federated Optimization

  • 2022-01-26 17:26:25
  • Grigory Malinovsky, Konstantin Mishchenko, Peter Richtárik
  • 1

Abstract

We present a theoretical study of server-side optimization in federatedlearning. Our results are the first to show that the widely popular heuristicof scaling the client updates with an extra parameter is very useful in thecontext of Federated Averaging (FedAvg) with local passes over the client data.Each local pass is performed without replacement using Random Reshuffling,which is a key reason we can show improved complexities. In particular, weprove that whenever the local stepsizes are small, and the update direction isgiven by FedAvg in conjunction with Random Reshuffling over all clients, onecan take a big leap in the obtained direction and improve rates for convex,strongly convex, and non-convex objectives. In particular, in non-convex regimewe get an enhancement of the rate of convergence from$\mathcal{O}\left(\varepsilon^{-3}\right)$ to$\mathcal{O}\left(\varepsilon^{-2}\right)$. This result is new even for RandomReshuffling performed on a single node. In contrast, if the local stepsizes arelarge, we prove that the noise of client sampling can be controlled by using asmall server-side stepsize. To the best of our knowledge, this is the firsttime that local steps provably help to overcome the communication bottleneck.Together, our results on the advantage of large and small server-side stepsizesgive a formal justification for the practice of adaptive server-sideoptimization in federated learning. Moreover, we consider a variant of ouralgorithm that supports partial client participation, which makes the methodmore practical.

 

Quick Read (beta)

loading the full paper ...