Abstract
Communication compression is a common technique in distributed optimizationthat can alleviate communication overhead by transmitting compressed gradientsand model parameters. However, compression can introduce informationdistortion, which slows down convergence and incurs more communication roundsto achieve desired solutions. Given the trade-off between lower per-roundcommunication costs and additional rounds of communication, it is unclearwhether communication compression reduces the total communication cost. This paper explores the conditions under which unbiased compression, a widelyused form of compression, can reduce the total communication cost, as well asthe extent to which it can do so. To this end, we present the first theoreticalformulation for characterizing the total communication cost in distributedoptimization with communication compression. We demonstrate that unbiasedcompression alone does not necessarily save the total communication cost, butthis outcome can be achieved if the compressors used by all workers are furtherassumed independent. We establish lower bounds on the communication roundsrequired by algorithms using independent unbiased compressors to minimizesmooth convex functions, and show that these lower bounds are tight by refiningthe analysis for ADIANA. Our results reveal that using independent unbiasedcompression can reduce the total communication cost by a factor of up to$\Theta(\sqrt{\min\{n, \kappa\}})$, where $n$ is the number of workers and$\kappa$ is the condition number of the functions being minimized. Thesetheoretical findings are supported by experimental results.