Abstract
Text-to-image diffusion models trained on a fixed set of resolutions oftenfail to generalize, even when asked to generate images at lower resolutionsthan those seen during training. High-resolution text-to-image generators arecurrently unable to easily offer an out-of-the-box budget-efficient alternativeto their users who might not need high-resolution images. We identify a keytechnical insight in diffusion models that when addressed can help tackle thislimitation: Noise schedulers have unequal perceptual effects acrossresolutions. The same level of noise removes disproportionately more signalfrom lower-resolution images than from high-resolution images, leading to atrain-test mismatch. We propose NoiseShift, a training-free method thatrecalibrates the noise level of the denoiser conditioned on resolution size.NoiseShift requires no changes to model architecture or sampling schedule andis compatible with existing models. When applied to Stable Diffusion 3, StableDiffusion 3.5, and Flux-Dev, quality at low resolutions is significantlyimproved. On LAION-COCO, NoiseShift improves SD3.5 by 15.89%, SD3 by 8.56%, andFlux-Dev by 2.44% in FID on average. On CelebA, NoiseShift improves SD3.5 by10.36%, SD3 by 5.19%, and Flux-Dev by 3.02% in FID on average. These resultsdemonstrate the effectiveness of NoiseShift in mitigating resolution-dependentartifacts and enhancing the quality of low-resolution image generation.