Temporal Predictive Coding for Gradient Compression in Distributed Learning

  • 2024-10-03 14:35:28
  • Adrian Edin, Zheng Chen, Michel Kieffer, Mikael Johansson
  • 0

Abstract

This paper proposes a prediction-based gradient compression method fordistributed learning with event-triggered communication. Our goal is to reducethe amount of information transmitted from the distributed agents to theparameter server by exploiting temporal correlation in the local gradients. Weuse a linear predictor that \textit{combines past gradients to form aprediction of the current gradient}, with coefficients that are optimized bysolving a least-square problem. In each iteration, every agent transmits thepredictor coefficients to the server such that the predicted local gradient canbe computed. The difference between the true local gradient and the predictedone, termed the \textit{prediction residual, is only transmitted when its normis above some threshold.} When this additional communication step is omitted,the server uses the prediction as the estimated gradient. This proposed designshows notable performance gains compared to existing methods in the literature,achieving convergence with reduced communication costs.

 

Quick Read (beta)

loading the full paper ...