Cascaded Diffusion Models for High Fidelity Image Generation

Abstract

We show that cascaded diffusion models are capable of generating highfidelity images on the class-conditional ImageNet generation benchmark, withoutany assistance from auxiliary image classifiers to boost sample quality. Acascaded diffusion model comprises a pipeline of multiple diffusion models thatgenerate images of increasing resolution, beginning with a standard diffusionmodel at the lowest resolution, followed by one or more super-resolutiondiffusion models that successively upsample the image and add higher resolutiondetails. We find that the sample quality of a cascading pipeline reliescrucially on conditioning augmentation, our proposed method of dataaugmentation of the lower resolution conditioning inputs to thesuper-resolution models. Our experiments show that conditioning augmentationprevents compounding error during sampling in a cascaded model, helping us totrain cascading pipelines achieving FID scores of 1.48 at 64x64, 3.52 at128x128 and 4.88 at 256x256 resolutions, outperforming BigGAN-deep, andclassification accuracy scores of 63.02% (top-1) and 84.06% (top-5) at 256x256,outperforming VQ-VAE-2.

Quick Read (beta)

loading the full paper ...