Abstract
Reinforcement learning (RL) is a promising data-driven approach for adaptivetraffic signal control (ATSC) in complex urban traffic networks, and deepneural networks further enhance its learning power. However, centralized RL isinfeasible for large-scale ATSC due to the extremely high dimension of thejoint action space. Multi-agent RL (MARL) overcomes the scalability issue bydistributing the global control to each local RL agent, but it introduces newchallenges: now the environment becomes partially observable from the viewpointof each local agent due to limited communication among agents. Most existingstudies in MARL focus on designing efficient communication and coordinationamong traditional Q-learning agents. This paper presents, for the first time, afully scalable and decentralized MARL algorithm for the state-of-the-art deepRL agent: advantage actor critic (A2C), within the context of ATSC. Inparticular, two methods are proposed to stabilize the learning procedure, byimproving the observability and reducing the learning difficulty of each localagent. The proposed multi-agent A2C is compared against independent A2C andindependent Q-learning algorithms, in both a large synthetic traffic grid and alarge real-world traffic network of Monaco city, under simulated peak-hourtraffic dynamics. Results demonstrate its optimality, robustness, and sampleefficiency over other state-of-the-art decentralized MARL algorithms.