Assessment of Reward Functions in Reinforcement Learning for Multi-Modal Urban Traffic Control under Real-World limitations

Abstract

Reinforcement Learning is proving a successful tool that can manage urbanintersections with a fraction of the effort required to curate traditionaltraffic controllers. However, literature on the introduction and control ofpedestrians to such intersections is scarce. Furthermore, it is unclear whattraffic state variables should be used as reward to obtain the best agentperformance. This paper robustly evaluates 30 different Reinforcement Learningreward functions for controlling intersections serving pedestrians and vehiclescovering the main traffic state variables available via modern vision-basedsensors. Some rewards proposed in previous literature solely for vehiculartraffic are extended to pedestrians while new ones are introduced. We use acalibrated model in terms of demand, sensors, green times and other operationalconstraints of a real intersection in Greater Manchester, UK. The assessedrewards can be classified in 5 groups depending on the magnitudes used: queues,waiting time, delay, average speed and throughput in the junction. Theperformance of different agents, in terms of waiting time, is compared acrossdifferent demand levels, from normal operation to saturation of traditionaladaptive controllers. We find that those rewards maximising the speed of thenetwork obtain the lowest waiting time for vehicles and pedestrianssimultaneously, closely followed by queue minimisation, demonstrating betterperformance than other previously proposed methods.

Quick Read (beta)

loading the full paper ...