Model-based graph reinforcement learning for inductive traffic signal control

Abstract

Most reinforcement learning methods for adaptive-traffic-signal-controlrequire training from scratch to be applied on any new intersection or afterany modification to the road network, traffic distribution, or behavioralconstraints experienced during training. Considering 1) the massive amount ofexperience required to train such methods, and 2) that experience must begathered by interacting in an exploratory fashion with real road-network-users,such a lack of transferability limits experimentation and applicability. Recentapproaches enable learning policies that generalize for unseen road-networktopologies and traffic distributions, partially tackling this challenge.However, the literature remains divided between the learning of cyclic (theevolution of connectivity at an intersection must respect a cycle) and acyclic(less constrained) policies, and these transferable methods 1) are onlycompatible with cyclic constraints and 2) do not enable coordination. Weintroduce a new model-based method, MuJAM, which, on top of enabling explicitcoordination at scale for the first time, pushes generalization further byallowing a generalization to the controllers' constraints. In a zero-shottransfer setting involving both road networks and traffic settings neverexperienced during training, and in a larger transfer experiment involving thecontrol of 3,971 traffic signal controllers in Manhattan, we show that MuJAM,using both cyclic and acyclic constraints, outperforms domain-specificbaselines as well as another transferable approach.

Quick Read (beta)

loading the full paper ...