Abstract
Generative world models have become essential data engines for autonomousdriving, yet most existing efforts focus on videos or occupancy grids,overlooking the unique LiDAR properties. Extending LiDAR generation to dynamic4D world modeling presents challenges in controllability, temporal coherence,and evaluation standardization. To this end, we present LiDARCrafter, a unifiedframework for 4D LiDAR generation and editing. Given free-form natural languageinputs, we parse instructions into ego-centric scene graphs, which condition atri-branch diffusion network to generate object structures, motiontrajectories, and geometry. These structured conditions enable diverse andfine-grained scene editing. Additionally, an autoregressive module generatestemporally coherent 4D LiDAR sequences with smooth transitions. To supportstandardized evaluation, we establish a comprehensive benchmark with diversemetrics spanning scene-, object-, and sequence-level aspects. Experiments onthe nuScenes dataset using this benchmark demonstrate that LiDARCrafterachieves state-of-the-art performance in fidelity, controllability, andtemporal consistency across all levels, paving the way for data augmentationand simulation. The code and benchmark are released to the community.