MXNET-MPI: Embedding MPI parallelism in Parameter Server Task Model for scaling Deep Learning

Abstract

Existing Deep Learning frameworks exclusively use either Parameter Server(PS)approach or MPI parallelism. In this paper, we discuss the drawbacks of suchapproaches and propose a generic framework supporting both PS and MPIprogramming paradigms, co-existing at the same time. The key advantage of thenew model is to embed the scaling benefits of MPI parallelism into the looselycoupled PS task model. Apart from providing a practical usage model of MPI incloud, such framework allows for novel communication avoiding algorithms thatdo parameter averaging in Stochastic Gradient Descent(SGD) approaches. We showhow MPI and PS models can synergestically apply algorithms such as Elastic SGDto improve the rate of convergence against existing approaches. These newalgorithms directly help scaling SGD clusterwide. Further, we also optimize thecritical component of the framework, namely global aggregation or allreduceusing a novel concept of tensor collectives. These treat a group of vectors ona node as a single object allowing for the existing single vector algorithms tobe directly applicable. We back our claims with sufficient emperical evidenceusing large scale ImageNet 1K data. Our framework is built upon MXNET but thedesign is generic and can be adapted to other popular DL infrastructures.

Quick Read (beta)

loading the full paper ...