Adaptive Gradient Sparsification for Efficient Federated Learning: An Online Learning Approach

Abstract

Federated learning (FL) is an emerging technique for training machinelearning models using geographically dispersed data collected by localentities. It includes local computation and synchronization steps. To reducethe communication overhead and improve the overall efficiency of FL, gradientsparsification (GS) can be applied, where instead of the full gradient, only asmall subset of important elements of the gradient is communicated. Existingwork on GS uses a fixed degree of gradient sparsity for i.i.d.-distributed datawithin a datacenter. In this paper, we consider adaptive degree of sparsity andnon-i.i.d. local datasets. We first present a fairness-aware GS method whichensures that different clients provide a similar amount of updates. Then, withthe goal of minimizing the overall training time, we propose a novel onlinelearning formulation and algorithm for automatically determining thenear-optimal communication and computation trade-off that is controlled by thedegree of gradient sparsity. The online learning algorithm uses an estimatedsign of the derivative of the objective function, which gives a regret boundthat is asymptotically equal to the case where exact derivative is available.Experiments with real datasets confirm the benefits of our proposed approaches,showing up to $40\%$ improvement in model accuracy for a finite training time.

Quick Read (beta)

loading the full paper ...