ElasticDiffusion: Training-free Arbitrary Size Image Generation through Global-Local Content Separation

Abstract

Diffusion models have revolutionized image generation in recent years, yetthey are still limited to a few sizes and aspect ratios. We proposeElasticDiffusion, a novel training-free decoding method that enables pretrainedtext-to-image diffusion models to generate images with various sizes.ElasticDiffusion attempts to decouple the generation trajectory of a pretrainedmodel into local and global signals. The local signal controls low-level pixelinformation and can be estimated on local patches, while the global signal isused to maintain overall structural consistency and is estimated with areference image. We test our method on CelebA-HQ (faces) and LAION-COCO(objects/indoor/outdoor scenes). Our experiments and qualitative results showsuperior image coherence quality across aspect ratios compared toMultiDiffusion and the standard decoding strategy of Stable Diffusion. Projectpage: https://elasticdiffusion.github.io/

Quick Read (beta)

loading the full paper ...