A3D: Does Diffusion Dream about 3D Alignment?

Abstract

We tackle the problem of text-driven 3D generation from a geometry alignmentperspective. Given a set of text prompts, we aim to generate a collection ofobjects with semantically corresponding parts aligned across them. Recentmethods based on Score Distillation have succeeded in distilling the knowledgefrom 2D diffusion models to high-quality representations of the 3D objects.These methods handle multiple text queries separately, and therefore theresulting objects have a high variability in object pose and structure.However, in some applications, such as 3D asset design, it may be desirable toobtain a set of objects aligned with each other. In order to achieve thealignment of the corresponding parts of the generated objects, we propose toembed these objects into a common latent space and optimize the continuoustransitions between these objects. We enforce two kinds of properties of thesetransitions: smoothness of the transition and plausibility of the intermediateobjects along the transition. We demonstrate that both of these properties areessential for good alignment. We provide several practical scenarios thatbenefit from alignment between the objects, including 3D editing and objecthybridization, and experimentally demonstrate the effectiveness of our method.https://voyleg.github.io/a3d/

Quick Read (beta)

loading the full paper ...