Abstract
Generative models have gained significant attention in novel view synthesis(NVS) by alleviating the reliance on dense multi-view captures. However,existing methods typically fall into a conventional paradigm, where generativemodels first complete missing areas in 2D, followed by 3D recovery techniquesto reconstruct the scene, which often results in overly smooth surfaces anddistorted geometry, as generative models struggle to infer 3D structure solelyfrom RGB data. In this paper, we propose SceneCompleter, a novel framework thatachieves 3D-consistent generative novel view synthesis through dense 3D scenecompletion. SceneCompleter achieves both visual coherence and 3D-consistentgenerative scene completion through two key components: (1) ageometry-appearance dual-stream diffusion model that jointly synthesizes novelviews in RGBD space; (2) a scene embedder that encodes a more holistic sceneunderstanding from the reference image. By effectively fusing structural andtextural information, our method demonstrates superior coherence andplausibility in generative novel view synthesis across diverse datasets.Project Page: https://chen-wl20.github.io/SceneCompleter