Abstract
Traffic signal control (TSC) is vital for mitigating congestion andsustaining urban mobility. In this paper, we introduce Traffic-R1, a foundationmodel with human-like reasoning for TSC systems. Our model is developed throughself-exploration and iteration of reinforced large language models (LLMs) withexpert guidance in a simulated traffic environment. Compared to traditionalreinforcement learning (RL) and recent LLM-based methods, Traffic-R1 offersthree significant advantages. First, Traffic-R1 delivers zero-shotgeneralisation, transferring unchanged to new road networks andout-of-distribution incidents by utilizing its internal traffic controlpolicies and human-like reasoning. Second, its 3B-parameter architecture islightweight enough for real-time inference on mobile-class chips, enablinglarge-scale edge deployment. Third, Traffic-R1 provides an explainable TSCprocess and facilitates multi-intersection communication through itsself-iteration and a new synchronous communication network. Extensivebenchmarks demonstrate that Traffic-R1 sets a new state of the art,outperforming strong baselines and training-intensive RL controllers. Inpractice, the model now manages signals for more than 55,000 drivers daily,shortening average queues by over 5% and halving operator workload. Ourcheckpoint is available at https://huggingface.co/Season998/Traffic-R1.