Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning

Abstract

One of the notorious issues for Reinforcement Learning (RL) is poor sampleefficiency. Compared to single agent RL, the sample efficiency for Multi-AgentReinforcement Learning (MARL) is more challenging because of its inherentpartial observability, non-stationary training, and enormous strategy space.Although much effort has been devoted to developing new methods and enhancingsample efficiency, we look at the widely used episodic training mechanism. Ineach training step, tens of frames are collected, but only one gradient step ismade. We argue that this episodic training could be a source of poor sampleefficiency. To better exploit the data already collected, we propose toincrease the frequency of the gradient updates per environment interaction(a.k.a. Replay Ratio or Update-To-Data ratio). To show its generality, weevaluate $3$ MARL methods on $6$ SMAC tasks. The empirical results validatethat a higher replay ratio significantly improves the sample efficiency forMARL algorithms. The codes to reimplement the results presented in this paperare open-sourced at https://anonymous.4open.science/r/rr_for_MARL-0D83/.

Quick Read (beta)

loading the full paper ...