ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction

Abstract

While personalized text-to-image generation has enabled the learning of asingle concept from multiple images, a more practical yet challenging scenarioinvolves learning multiple concepts within a single image. However, existingworks tackling this scenario heavily rely on extensive human annotations. Inthis paper, we introduce a novel task named Unsupervised Concept Extraction(UCE) that considers an unsupervised setting without any human knowledge of theconcepts. Given an image that contains multiple concepts, the task aims toextract and recreate individual concepts solely relying on the existingknowledge from pretrained diffusion models. To achieve this, we presentConceptExpress that tackles UCE by unleashing the inherent capabilities ofpretrained diffusion models in two aspects. Specifically, a conceptlocalization approach automatically locates and disentangles salient conceptsby leveraging spatial correspondence from diffusion self-attention; and basedon the lookup association between a concept and a conceptual token, aconcept-wise optimization process learns discriminative tokens that representeach individual concept. Finally, we establish an evaluation protocol tailoredfor the UCE task. Extensive experiments demonstrate that ConceptExpress is apromising solution to the UCE task. Our code and data are available at:https://github.com/haoosz/ConceptExpress

Quick Read (beta)

loading the full paper ...