Elastic Consistency: A General Consistency Model for Distributed Stochastic Gradient Descent

  • 2020-01-16 16:10:58
  • Dan Alistarh, Bapi Chatterjee, Vyacheslav Kungurtsev
  • 2

Abstract

Machine learning has made tremendous progress in recent years, with modelsmatching or even surpassing humans on a series of specialized tasks. One keyelement behind the progress of machine learning in recent years has been theability to train machine learning models in large-scale distributedshared-memory and message-passing environments. Many of these models aretrained employing variants of stochastic gradient descent (SGD) basedoptimization. In this paper, we introduce a general consistency condition coveringcommunication-reduced and asynchronous distributed SGD implementations. Ourframework, called elastic consistency enables us to derive convergence boundsfor a variety of distributed SGD methods used in practice to train large-scalemachine learning models. The proposed framework de-clutters theimplementation-specific convergence analysis and provides an abstraction toderive convergence bounds. We utilize the framework to analyze a sparsificationscheme for distributed SGD methods in an asynchronous setting for convex andnon-convex objectives. We implement the distributed SGD variant to train deepCNN models in an asynchronous shared-memory setting. Empirical results showthat error-feedback may not necessarily help in improving the convergence ofsparsified asynchronous distributed SGD, which corroborates an insightsuggested by our convergence analysis.

 

Quick Read (beta)

loading the full paper ...