CM3: Cooperative Multi-goal Multi-stage Multi-agent Reinforcement Learning

Abstract

We propose CM3, a new deep reinforcement learning method for cooperativemulti-agent problems where agents must coordinate for joint success inachieving different individual goals. We restructure multi-agent learning intoa two-stage curriculum, consisting of a single-agent stage for learning toaccomplish individual tasks, followed by a multi-agent stage for learning tocooperate in the presence of other agents. These two stages are bridged bymodular augmentation of neural network policy and value functions. We furtheradapt the actor-critic framework to this curriculum by formulating local andglobal views of the policy gradient and learning via a double critic,consisting of a decentralized value function and a centralized action-valuefunction. We evaluated CM3 on a new high-dimensional multi-agent environmentwith sparse rewards: negotiating lane changes among multiple autonomousvehicles in the Simulation of Urban Mobility (SUMO) traffic simulator. Detailedablation experiments show the positive contribution of each component in CM3,and the overall synthesis converges significantly faster to higher performancepolicies than existing cooperative multi-agent methods.

Quick Read (beta)

loading the full paper ...