This paper deals with distributed reinforcement learning (DRL), whichinvolves a central controller and a group of learners. In particular, two DRLsettings encountered in several applications are considered: multi-agentreinforcement learning (RL) and parallel RL, where frequent informationexchanges between the learners and the controller are required. For manypractical distributed systems, however, such as those involving parallelmachines for training deep RL algorithms, and multi-robot systems for learningthe optimal coordination strategies, the overhead caused by these frequentcommunication exchanges is considerable, and becomes the bottleneck of theoverall performance. To address this challenge, a novel policy gradient methodis developed here to cope with such communication-constrained DRL settings. Theproposed approach reduces the communication overhead without degrading learningperformance by adaptively skipping the policy gradient communication duringiterations. It is established analytically that i) the novel algorithm hasconvergence rate identical to that of the plain-vanilla policy gradient forDRL; while ii) if the distributed computing units are heterogeneous in terms oftheir reward functions and initial state distributions, the number ofcommunication rounds needed to achieve a desirable learning accuracy ismarkedly reduced. Numerical experiments on a popular multi-agent RL benchmarkcorroborate the significant communication reduction attained by the novelalgorithm compared to alternatives.