Removing Undesirable Concepts in Text-to-Image Generative Models with Learnable Prompts

Abstract

Generative models have demonstrated remarkable potential in generatingvisually impressive content from textual descriptions. However, training thesemodels on unfiltered internet data poses the risk of learning and subsequentlypropagating undesirable concepts, such as copyrighted or unethical content. Inthis paper, we propose a novel method to remove undesirable concepts fromtext-to-image generative models by incorporating a learnable prompt into thecross-attention module. This learnable prompt acts as additional memory totransfer the knowledge of undesirable concepts into it and reduce thedependency of these concepts on the model parameters and corresponding textualinputs. Because of this knowledge transfer into the prompt, erasing theseundesirable concepts is more stable and has minimal negative impact on otherconcepts. We demonstrate the effectiveness of our method on the StableDiffusion model, showcasing its superiority over state-of-the-art erasuremethods in terms of removing undesirable content while preserving otherunrelated elements.

Quick Read (beta)

loading the full paper ...