GaussVideoDreamer: 3D Scene Generation with Video Diffusion and Inconsistency-Aware Gaussian Splatting

Abstract

Single-image 3D scene reconstruction presents significant challenges due toits inherently ill-posed nature and limited input constraints. Recent advanceshave explored two promising directions: multiview generative models that trainon 3D consistent datasets but struggle with out-of-distribution generalization,and 3D scene inpainting and completion frameworks that suffer from cross-viewinconsistency and suboptimal error handling, as they depend exclusively ondepth data or 3D smoothness, which ultimately degrades output quality andcomputational performance. Building upon these approaches, we presentGaussVideoDreamer, which advances generative multimedia approaches by bridgingthe gap between image, video, and 3D generation, integrating their strengthsthrough two key innovations: (1) A progressive video inpainting strategy thatharnesses temporal coherence for improved multiview consistency and fasterconvergence. (2) A 3D Gaussian Splatting consistency mask to guide the videodiffusion with 3D consistent multiview evidence. Our pipeline combines threecore components: a geometry-aware initialization protocol, Inconsistency-AwareGaussian Splatting, and a progressive video inpainting strategy. Experimentalresults demonstrate that our approach achieves 32% higher LLaVA-IQA scores andat least 2x speedup compared to existing methods while maintaining robustperformance across diverse scenes.

Quick Read (beta)

loading the full paper ...