Abstract
Traffic forecasting on road networks is a complex task of significantpractical importance that has recently attracted considerable attention fromthe machine learning community, with spatiotemporal graph neural networks(GNNs) becoming the most popular approach. The proper evaluation of trafficforecasting methods requires realistic datasets, but current publicly availablebenchmarks have significant drawbacks, including the absence of informationabout road connectivity for road graph construction, limited information aboutroad properties, and a relatively small number of road segments that fallsshort of real-world applications. Further, current datasets mostly containinformation about intercity highways with sparsely located sensors, while cityroad networks arguably present a more challenging forecasting task due to muchdenser roads and more complex urban traffic patterns. In this work, we providea more complete, realistic, and challenging benchmark for traffic forecastingby releasing datasets representing the road networks of two major cities, withthe largest containing almost 100,000 road segments (more than a 10-foldincrease relative to existing datasets). Our datasets contain rich roadfeatures and provide fine-grained data about both traffic volume and trafficspeed, allowing for building more holistic traffic forecasting systems. We showthat most current implementations of neural spatiotemporal models for trafficforecasting have problems scaling to datasets of our size. To overcome thisissue, we propose an alternative approach to neural traffic forecasting thatuses a GNN without a dedicated module for temporal sequence processing, thusachieving much better scalability, while also demonstrating strongerforecasting performance. We hope our datasets and modeling insights will serveas a valuable resource for research in traffic forecasting.