RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning

Abstract

Existing end-to-end autonomous driving (AD) algorithms typically follow theImitation Learning (IL) paradigm, which faces challenges such as causalconfusion and the open-loop gap. In this work, we establish a 3DGS-basedclosed-loop Reinforcement Learning (RL) training paradigm. By leveraging 3DGStechniques, we construct a photorealistic digital replica of the real physicalworld, enabling the AD policy to extensively explore the state space and learnto handle out-of-distribution scenarios through large-scale trial and error. Toenhance safety, we design specialized rewards that guide the policy toeffectively respond to safety-critical events and understand real-world causalrelationships. For better alignment with human driving behavior, IL isincorporated into RL training as a regularization term. We introduce aclosed-loop evaluation benchmark consisting of diverse, previously unseen 3DGSenvironments. Compared to IL-based methods, RAD achieves stronger performancein most closed-loop metrics, especially 3x lower collision rate. Abundantclosed-loop results are presented at https://hgao-cv.github.io/RAD.

Quick Read (beta)

loading the full paper ...