HiFA: High-fidelity Text-to-3D with Advanced Diffusion Guidance

Abstract

Automatic text-to-3D synthesis has achieved remarkable advancements throughthe optimization of 3D models. Existing methods commonly rely on pre-trainedtext-to-image generative models, such as diffusion models, providing scores for2D renderings of Neural Radiance Fields (NeRFs) and being utilized foroptimizing NeRFs. However, these methods often encounter artifacts andinconsistencies across multiple views due to their limited understanding of 3Dgeometry. To address these limitations, we propose a reformulation of theoptimization loss using the diffusion prior. Furthermore, we introduce a noveltraining approach that unlocks the potential of the diffusion prior. To improve3D geometry representation, we apply auxiliary depth supervision forNeRF-rendered images and regularize the density field of NeRFs. Extensiveexperiments demonstrate the superiority of our method over prior works,resulting in advanced photo-realism and improved multi-view consistency.

Quick Read (beta)

loading the full paper ...