Decoupled Diffusion Sparks Adaptive Scene Generation

Abstract

Controllable scene generation could reduce the cost of diverse datacollection substantially for autonomous driving. Prior works formulate thetraffic layout generation as predictive progress, either by denoising entiresequences at once or by iteratively predicting the next frame. However, fullsequence denoising hinders online reaction, while the latter's short-sightednext-frame prediction lacks precise goal-state guidance. Further, the learnedmodel struggles to generate complex or challenging scenarios due to a largenumber of safe and ordinal driving behaviors from open datasets. To overcomethese, we introduce Nexus, a decoupled scene generation framework that improvesreactivity and goal conditioning by simulating both ordinal and challengingscenarios from fine-grained tokens with independent noise states. At the coreof the decoupled pipeline is the integration of a partial noise-maskingtraining strategy and a noise-aware schedule that ensures timely environmentalupdates throughout the denoising process. To complement challenging scenariogeneration, we collect a dataset consisting of complex corner cases. It covers540 hours of simulated data, including high-risk interactions such as cut-in,sudden braking, and collision. Nexus achieves superior generation realism whilepreserving reactivity and goal orientation, with a 40% reduction indisplacement error. We further demonstrate that Nexus improves closed-loopplanning by 20% through data augmentation and showcase its capability insafety-critical data generation.

Quick Read (beta)

loading the full paper ...