Scaling Is All You Need: Training Strong Policies for Autonomous Driving with JAX-Accelerated Reinforcement Learning

Abstract

Reinforcement learning has been used to train policies that outperform eventhe best human players in various games. However, a large amount of data isneeded to achieve good performance, which in turn requires building large-scaleframeworks and simulators. In this paper, we study how large-scalereinforcement learning can be applied to autonomous driving, analyze how theresulting policies perform as the experiment size is scaled, and what the mostimportant factors contributing to policy performance are. To do this, we firstintroduce a hardware-accelerated autonomous driving simulator, which allows usto efficiently collect experience from billions of agent steps. This simulatoris paired with a large-scale, multi-GPU reinforcement learning framework. Wedemonstrate that simultaneous scaling of dataset size, model size, and agentsteps trained provides increasingly strong driving policies in regard tocollision, traffic rule violations, and progress. In particular, our bestpolicy reduces the failure rate by 57% while improving progress by 23% comparedto the current state-of-the-art machine learning policies for autonomousdriving.

Quick Read (beta)

loading the full paper ...