Abstract
Learning a human-like driving policy from large-scale driving demonstrationsis promising, but the uncertainty and non-deterministic nature of planning makeit challenging. In this work, to cope with the uncertainty problem, we proposeVADv2, an end-to-end driving model based on probabilistic planning. VADv2 takesmulti-view image sequences as input in a streaming manner, transforms sensordata into environmental token embeddings, outputs the probabilisticdistribution of action, and samples one action to control the vehicle. Onlywith camera sensors, VADv2 achieves state-of-the-art closed-loop performance onthe CARLA Town05 benchmark, significantly outperforming all existing methods.It runs stably in a fully end-to-end manner, even without the rule-basedwrapper. Closed-loop demos are presented at https://hgao-cv.github.io/VADv2.