CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models

Abstract

We present CAT4D, a method for creating 4D (dynamic 3D) scenes from monocularvideo. CAT4D leverages a multi-view video diffusion model trained on a diversecombination of datasets to enable novel view synthesis at any specified cameraposes and timestamps. Combined with a novel sampling approach, this model cantransform a single monocular video into a multi-view video, enabling robust 4Dreconstruction via optimization of a deformable 3D Gaussian representation. Wedemonstrate competitive performance on novel view synthesis and dynamic scenereconstruction benchmarks, and highlight the creative capabilities for 4D scenegeneration from real or generated videos. See our project page for results andinteractive demos: \url{cat-4d.github.io}.

Quick Read (beta)

loading the full paper ...