3D Scene Generation: A Survey

Abstract

3D scene generation seeks to synthesize spatially structured, semanticallymeaningful, and photorealistic environments for applications such as immersivemedia, robotics, autonomous driving, and embodied AI. Early methods based onprocedural rules offered scalability but limited diversity. Recent advances indeep generative models (e.g., GANs, diffusion models) and 3D representations(e.g., NeRF, 3D Gaussians) have enabled the learning of real-world scenedistributions, improving fidelity, diversity, and view consistency. Recentadvances like diffusion models bridge 3D scene synthesis and photorealism byreframing generation as image or video synthesis problems. This survey providesa systematic overview of state-of-the-art approaches, organizing them into fourparadigms: procedural generation, neural 3D-based generation, image-basedgeneration, and video-based generation. We analyze their technical foundations,trade-offs, and representative results, and review commonly used datasets,evaluation protocols, and downstream applications. We conclude by discussingkey challenges in generation capacity, 3D representation, data and annotations,and evaluation, and outline promising directions including higher fidelity,physics-aware and interactive generation, and unified perception-generationmodels. This review organizes recent advances in 3D scene generation andhighlights promising directions at the intersection of generative AI, 3Dvision, and embodied intelligence. To track ongoing developments, we maintainan up-to-date project page:https://github.com/hzxie/Awesome-3D-Scene-Generation.

Quick Read (beta)

loading the full paper ...