Diffusion Self-Guidance for Controllable Image Generation

Abstract

Large-scale generative models are capable of producing high-quality imagesfrom detailed text descriptions. However, many aspects of an image aredifficult or impossible to convey through text. We introduce self-guidance, amethod that provides greater control over generated images by guiding theinternal representations of diffusion models. We demonstrate that propertiessuch as the shape, location, and appearance of objects can be extracted fromthese representations and used to steer sampling. Self-guidance works similarlyto classifier guidance, but uses signals present in the pretrained modelitself, requiring no additional models or training. We show how a simple set ofproperties can be composed to perform challenging image manipulations, such asmodifying the position or size of objects, merging the appearance of objects inone image with the layout of another, composing objects from many images intoone, and more. We also show that self-guidance can be used to edit real images.For results and an interactive demo, see our project page athttps://dave.ml/selfguidance/

Quick Read (beta)

loading the full paper ...