Promoting Coordination through Policy Regularization in Multi-Agent Deep Reinforcement Learning

Abstract

In multi-agent reinforcement learning, discovering successful collectivebehaviors is challenging as it requires exploring a joint action space thatgrows exponentially with the number of agents. While the tractability ofindependent agent-wise exploration is appealing, this approach fails on tasksthat require elaborate group strategies. We argue that coordinating the agents'policies can guide their exploration and we investigate techniques to promotesuch an inductive bias. We propose two policy regularization methods: TeamReg,which is based on inter-agent action predictability and CoachReg that relies onsynchronized behavior selection. We evaluate each approach on four challengingcontinuous control tasks with sparse rewards that require varying levels ofcoordination. Our methodology allocates the same hyper-parameter search budgetacross our algorithms and baselines and we find that our approaches are morerobust to hyper-parameter variations. Our experiments show that our methodssignificantly improve performance on cooperative multi-agent problems and scalewell when the number of agents is increased. Finally, we quantitatively analyzethe effects of our proposed methods on the policies that our agents learn andwe show that our methods successfully enforce the qualities that we propose asproxies for coordinated behaviors.

Quick Read (beta)

loading the full paper ...