Magic3D: High-Resolution Text-to-3D Content Creation

Abstract

DreamFusion has recently demonstrated the utility of a pre-trainedtext-to-image diffusion model to optimize Neural Radiance Fields (NeRF),achieving remarkable text-to-3D synthesis results. However, the method has twoinherent limitations: (a) extremely slow optimization of NeRF and (b)low-resolution image space supervision on NeRF, leading to low-quality 3Dmodels with a long processing time. In this paper, we address these limitationsby utilizing a two-stage optimization framework. First, we obtain a coarsemodel using a low-resolution diffusion prior and accelerate with a sparse 3Dhash grid structure. Using the coarse representation as the initialization, wefurther optimize a textured 3D mesh model with an efficient differentiablerenderer interacting with a high-resolution latent diffusion model. Our method,dubbed Magic3D, can create high quality 3D mesh models in 40 minutes, which is2x faster than DreamFusion (reportedly taking 1.5 hours on average), while alsoachieving higher resolution. User studies show 61.7% raters to prefer ourapproach over DreamFusion. Together with the image-conditioned generationcapabilities, we provide users with new ways to control 3D synthesis, openingup new avenues to various creative applications.

Quick Read (beta)

loading the full paper ...