Abstract
We introduce GarmentCrafter, a new approach that enables non-professionalusers to create and modify 3D garments from a single-view image. While recentadvances in image generation have facilitated 2D garment design, creating andediting 3D garments remains challenging for non-professional users. Existingmethods for single-view 3D reconstruction often rely on pre-trained generativemodels to synthesize novel views conditioning on the reference image and camerapose, yet they lack cross-view consistency, failing to capture the internalrelationships across different views. In this paper, we tackle this challengethrough progressive depth prediction and image warping to approximate novelviews. Subsequently, we train a multi-view diffusion model to complete occludedand unknown clothing regions, informed by the evolving camera pose. By jointlyinferring RGB and depth, GarmentCrafter enforces inter-view coherence andreconstructs precise geometries and fine details. Extensive experimentsdemonstrate that our method achieves superior visual fidelity and inter-viewcoherence compared to state-of-the-art single-view 3D garment reconstructionmethods.