PartGen: Part-level 3D Generation and Reconstruction with Multi-View Diffusion Models

  • 2024-12-24 18:59:43
  • Minghao Chen, Roman Shapovalov, Iro Laina, Tom Monnier, Jianyuan Wang, David Novotny, Andrea Vedaldi
  • 0

Abstract

Text- or image-to-3D generators and 3D scanners can now produce 3D assetswith high-quality shapes and textures. These assets typically consist of asingle, fused representation, like an implicit neural field, a Gaussianmixture, or a mesh, without any useful structure. However, most applicationsand creative workflows require assets to be made of several meaningful partsthat can be manipulated independently. To address this gap, we introducePartGen, a novel approach that generates 3D objects composed of meaningfulparts starting from text, an image, or an unstructured 3D object. First, givenmultiple views of a 3D object, generated or rendered, a multi-view diffusionmodel extracts a set of plausible and view-consistent part segmentations,dividing the object into parts. Then, a second multi-view diffusion model takeseach part separately, fills in the occlusions, and uses those completed viewsfor 3D reconstruction by feeding them to a 3D reconstruction network. Thiscompletion process considers the context of the entire object to ensure thatthe parts integrate cohesively. The generative completion model can make up forthe information missing due to occlusions; in extreme cases, it can hallucinateentirely invisible parts based on the input 3D asset. We evaluate our method ongenerated and real 3D assets and show that it outperforms segmentation andpart-extraction baselines by a large margin. We also showcase downstreamapplications such as 3D part editing.

 

Quick Read (beta)

loading the full paper ...