DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos

Abstract

We introduce the Deformable Gaussian Splats Large Reconstruction Model(DGS-LRM), the first feed-forward method predicting deformable 3D Gaussiansplats from a monocular posed video of any dynamic scene. Feed-forward scenereconstruction has gained significant attention for its ability to rapidlycreate digital replicas of real-world environments. However, most existingmodels are limited to static scenes and fail to reconstruct the motion ofmoving objects. Developing a feed-forward model for dynamic scenereconstruction poses significant challenges, including the scarcity of trainingdata and the need for appropriate 3D representations and training paradigms. Toaddress these challenges, we introduce several key technical contributions: anenhanced large-scale synthetic dataset with ground-truth multi-view videos anddense 3D scene flow supervision; a per-pixel deformable 3D Gaussianrepresentation that is easy to learn, supports high-quality dynamic viewsynthesis, and enables long-range 3D tracking; and a large transformer networkthat achieves real-time, generalizable dynamic scene reconstruction. Extensivequalitative and quantitative experiments demonstrate that DGS-LRM achievesdynamic scene reconstruction quality comparable to optimization-based methods,while significantly outperforming the state-of-the-art predictive dynamicreconstruction method on real-world examples. Its predicted physically grounded3D deformation is accurate and can readily adapt for long-range 3D trackingtasks, achieving performance on par with state-of-the-art monocular video 3Dtracking methods.

Quick Read (beta)

loading the full paper ...