GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism

Abstract

GPipe is a scalable pipeline parallelism library that enables learning ofgiant deep neural networks. It partitions network layers across acceleratorsand pipelines execution to achieve high hardware utilization. It leveragesrecomputation to minimize activation memory usage. For example, usingpartitions over 8 accelerators, it is able to train networks that are 25xlarger, demonstrating its scalability. It also guarantees that the computedgradients remain consistent regardless of the number of partitions. It achievesan almost linear speed up without any changes in the model parameters: whenusing 4x more accelerators, training the same model is up to 3.5x faster. Wetrain a 557 million parameters AmoebaNet model on ImageNet and achieve a newstate-of-the-art 84.3% top-1 / 97.0% top-5 accuracy on ImageNet. Finally, weuse this learned model as an initialization for training 7 different popularimage classification datasets and obtain results that exceed the best publishedones on 5 of them, including pushing the CIFAR-10 accuracy to 99% and CIFAR-100accuracy to 91.3%.

Quick Read (beta)

loading the full paper ...