Abstract
We present SR3, an approach to image Super-Resolution via RepeatedRefinement. SR3 adapts denoising diffusion probabilistic models to conditionalimage generation and performs super-resolution through a stochastic denoisingprocess. Inference starts with pure Gaussian noise and iteratively refines thenoisy output using a U-Net model trained on denoising at various noise levels.SR3 exhibits strong performance on super-resolution tasks at differentmagnification factors, on faces and natural images. We conduct human evaluationon a standard 8X face super-resolution task on CelebA-HQ, comparing with SOTAGAN methods. SR3 achieves a fool rate close to 50%, suggesting photo-realisticoutputs, while GANs do not exceed a fool rate of 34%. We further show theeffectiveness of SR3 in cascaded image generation, where generative models arechained with super-resolution models, yielding a competitive FID score of 11.3on ImageNet.