Abstract
Novel-view synthesis aims to generate novel views of a scene from multipleinput images or videos, and recent advancements like 3D Gaussian splatting(3DGS) have achieved notable success in producing photorealistic renderingswith efficient pipelines. However, generating high-quality novel views underchallenging settings, such as sparse input views, remains difficult due toinsufficient information in under-sampled areas, often resulting in noticeableartifacts. This paper presents 3DGS-Enhancer, a novel pipeline for enhancingthe representation quality of 3DGS representations. We leverage 2D videodiffusion priors to address the challenging 3D view consistency problem,reformulating it as achieving temporal consistency within a video generationprocess. 3DGS-Enhancer restores view-consistent latent features of renderednovel views and integrates them with the input views through a spatial-temporaldecoder. The enhanced views are then used to fine-tune the initial 3DGS model,significantly improving its rendering performance. Extensive experiments onlarge-scale datasets of unbounded scenes demonstrate that 3DGS-Enhancer yieldssuperior reconstruction performance and high-fidelity rendering resultscompared to state-of-the-art methods. The project webpage ishttps://xiliu8006.github.io/3DGS-Enhancer-project .