### Abstract

In large-scale distributed machine learning, recent works have studied theeffects of compressing gradients in stochastic optimization to alleviate thecommunication bottleneck. These works have collectively revealed thatstochastic gradient descent (SGD) is robust to structured perturbations such asquantization, sparsification, and delays. Perhaps surprisingly, despite thesurge of interest in multi-agent reinforcement learning, almost nothing isknown about the analogous question: Are common reinforcement learning (RL)algorithms also robust to similar perturbations? We investigate this questionby studying a variant of the classical temporal difference (TD) learningalgorithm with a perturbed update direction, where a general compressionoperator is used to model the perturbation. Our work makes three importanttechnical contributions. First, we prove that compressed TD algorithms, coupledwith an error-feedback mechanism used widely in optimization, exhibit the samenon-asymptotic theoretical guarantees as their SGD counterparts. Second, weshow that our analysis framework extends seamlessly to nonlinear stochasticapproximation schemes that subsume Q-learning. Third, we prove that formulti-agent TD learning, one can achieve linear convergence speedups withrespect to the number of agents while communicating just $\tilde{O}(1)$ bitsper iteration. Notably, these are the first finite-time results in RL thataccount for general compression operators and error-feedback in tandem withlinear function approximation and Markovian sampling. Our proofs hinge on theconstruction of novel Lyapunov functions that capture the dynamics of a memoryvariable introduced by error-feedback.