Error Feedback Fixes SignSGD and other Gradient Compression Schemes

Abstract

Sign-based algorithms (e.g. signSGD) have been proposed as a biased gradientcompression technique to alleviate the communication bottleneck in traininglarge neural networks across multiple workers. We show simple convexcounter-examples where signSGD does not converge to the optimum. Further, evenwhen it does converge, signSGD may generalize poorly when compared with SGD.These issues arise because of the biased nature of the sign compressionoperator. We then show that using error-feedback, i.e. incorporating the error made bythe compression operator into the next step, overcomes these issues. We provethat our algorithm EF-SGD achieves the same rate of convergence as SGD withoutany additional assumptions for arbitrary compression operators (including thesign operator), indicating that we get gradient compression for free. Ourexperiments thoroughly substantiate the theory showing the superiority of ouralgorithm.

Quick Read (beta)

loading the full paper ...