Abstract
LiDAR scene generation has been developing rapidly recently. However,existing methods primarily focus on generating static and single-frame scenes,overlooking the inherently dynamic nature of real-world driving environments.In this work, we introduce DynamicCity, a novel 4D LiDAR generation frameworkcapable of generating large-scale, high-quality LiDAR scenes that capture thetemporal evolution of dynamic environments. DynamicCity mainly consists of twokey models. 1) A VAE model for learning HexPlane as the compact 4Drepresentation. Instead of using naive averaging operations, DynamicCityemploys a novel Projection Module to effectively compress 4D LiDAR featuresinto six 2D feature maps for HexPlane construction, which significantlyenhances HexPlane fitting quality (up to 12.56 mIoU gain). Furthermore, weutilize an Expansion & Squeeze Strategy to reconstruct 3D feature volumes inparallel, which improves both network training efficiency and reconstructionaccuracy than naively querying each 3D point (up to 7.05 mIoU gain, 2.06xtraining speedup, and 70.84% memory reduction). 2) A DiT-based diffusion modelfor HexPlane generation. To make HexPlane feasible for DiT generation, a PaddedRollout Operation is proposed to reorganize all six feature planes of theHexPlane as a squared 2D feature map. In particular, various conditions couldbe introduced in the diffusion or sampling process, supporting versatile 4Dgeneration applications, such as trajectory- and command-driven generation,inpainting, and layout-conditioned generation. Extensive experiments on theCarlaSC and Waymo datasets demonstrate that DynamicCity significantlyoutperforms existing state-of-the-art 4D LiDAR generation methods acrossmultiple metrics. The code will be released to facilitate future research.