Abstract
We present PartComposer: a framework for part-level concept learning fromsingle-image examples that enables text-to-image diffusion models to composenovel objects from meaningful components. Existing methods either struggle witheffectively learning fine-grained concepts or require a large dataset as input.We propose a dynamic data synthesis pipeline generating diverse partcompositions to address one-shot data scarcity. Most importantly, we propose tomaximize the mutual information between denoised latents and structured conceptcodes via a concept predictor, enabling direct regulation on conceptdisentanglement and re-composition supervision. Our method achieves strongdisentanglement and controllable composition, outperforming subject andpart-level baselines when mixing concepts from the same, or different, objectcategories.