Abstract
In 3D modeling, designers often use an existing 3D model as a reference tocreate new ones. This practice has inspired the development of Phidias, a novelgenerative model that uses diffusion for reference-augmented 3D generation.Given an image, our method leverages a retrieved or user-provided 3D referencemodel to guide the generation process, thereby enhancing the generationquality, generalization ability, and controllability. Our model integratesthree key components: 1) meta-ControlNet that dynamically modulates theconditioning strength, 2) dynamic reference routing that mitigates misalignmentbetween the input image and 3D reference, and 3) self-reference augmentationsthat enable self-supervised training with a progressive curriculum.Collectively, these designs result in a clear improvement over existingmethods. Phidias establishes a unified framework for 3D generation using text,image, and 3D conditions with versatile applications.