Taming Unbalanced Training Workloads in Deep Learning with Partial Collective Operations

  • 2019-08-12 15:37:51
  • Shigang Li, Tal Ben-Nun, Salvatore Di Girolamo, Dan Alistarh, Torsten Hoefler
  • 4

Abstract

Load imbalance pervasively exists in distributed deep learning trainingsystems, either caused by the inherent imbalance in learned tasks or by thesystem itself. Traditional synchronous Stochastic Gradient Descent (SGD)achieves good accuracy for a wide variety of tasks, but relies on globalsynchronization to accumulate the gradients at every training step. In thispaper, we propose eager-SGD, which relaxes the global synchronization fordecentralized accumulation. To implement eager-SGD, we propose to use twopartial collectives: solo and majority. With solo allreduce, the fasterprocesses contribute their gradients eagerly without waiting for the slowerprocesses, whereas with majority allreduce, at least half of the participantsmust contribute gradients before continuing, all without using a centralparameter server. We theoretically prove the convergence of the algorithms anddescribe the partial collectives in detail. Experimental results onload-imbalanced environments (CIFAR-10, ImageNet, and UCF101 datasets) showthat eager-SGD achieves 1.27x speedup over the state-of-the-art synchronousSGD, without losing accuracy.

 

Quick Read (beta)

loading the full paper ...