Distilling Multi-view Diffusion Models into 3D Generators

Abstract

We introduce DD3G, a formulation that Distills a multi-view Diffusion model(MV-DM) into a 3D Generator using gaussian splatting. DD3G compresses andintegrates extensive visual and spatial geometric knowledge from the MV-DM bysimulating its ordinary differential equation (ODE) trajectory, ensuring thedistilled generator generalizes better than those trained solely on 3D data.Unlike previous amortized optimization approaches, we align the MV-DM and 3Dgenerator representation spaces to transfer the teacher's probabilistic flow tothe student, thus avoiding inconsistencies in optimization objectives caused byprobabilistic sampling. The introduction of probabilistic flow and the couplingof various attributes in 3D Gaussians introduce challenges in the generationprocess. To tackle this, we propose PEPD, a generator consisting of PatternExtraction and Progressive Decoding phases, which enables efficient fusion ofprobabilistic flow and converts a single image into 3D Gaussians within 0.06seconds. Furthermore, to reduce knowledge loss and overcome sparse-viewsupervision, we design a joint optimization objective that ensures the qualityof generated samples through explicit supervision and implicit verification.Leveraging existing 2D generation models, we compile 120k high-quality RGBAimages for distillation. Experiments on synthetic and public datasetsdemonstrate the effectiveness of our method. Our project is available at:https://qinbaigao.github.io/DD3G_project/

Quick Read (beta)

loading the full paper ...