SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency

Abstract

We present Stable Video 4D (SV4D), a latent video diffusion model formulti-frame and multi-view consistent dynamic 3D content generation. Unlikeprevious methods that rely on separately trained generative models for videogeneration and novel view synthesis, we design a unified diffusion model togenerate novel view videos of dynamic 3D objects. Specifically, given amonocular reference video, SV4D generates novel views for each video frame thatare temporally consistent. We then use the generated novel view videos tooptimize an implicit 4D representation (dynamic NeRF) efficiently, without theneed for cumbersome SDS-based optimization used in most prior works. To trainour unified novel view video generation model, we curated a dynamic 3D objectdataset from the existing Objaverse dataset. Extensive experimental results onmultiple datasets and user studies demonstrate SV4D's state-of-the-artperformance on novel-view video synthesis as well as 4D generation compared toprior works.

Quick Read (beta)

loading the full paper ...