Null-text Inversion for Editing Real Images using Guided Diffusion Models

Abstract

Recent text-guided diffusion models provide powerful image generationcapabilities. Currently, a massive effort is given to enable the modificationof these images using text only as means to offer intuitive and versatileediting. To edit a real image using these state-of-the-art tools, one mustfirst invert the image with a meaningful text prompt into the pretrainedmodel's domain. In this paper, we introduce an accurate inversion technique andthus facilitate an intuitive text-based modification of the image. Our proposedinversion consists of two novel key components: (i) Pivotal inversion fordiffusion models. While current methods aim at mapping random noise samples toa single input image, we use a single pivotal noise vector for each timestampand optimize around it. We demonstrate that a direct inversion is inadequate onits own, but does provide a good anchor for our optimization. (ii) NULL-textoptimization, where we only modify the unconditional textual embedding that isused for classifier-free guidance, rather than the input text embedding. Thisallows for keeping both the model weights and the conditional embedding intactand hence enables applying prompt-based editing while avoiding the cumbersometuning of the model's weights. Our Null-text inversion, based on the publiclyavailable Stable Diffusion model, is extensively evaluated on a variety ofimages and prompt editing, showing high-fidelity editing of real images.

Quick Read (beta)

loading the full paper ...