Consistency Models - Paper Detail

Abstract

Diffusion models have made significant breakthroughs in image, audio, andvideo generation, but they depend on an iterative generation process thatcauses slow sampling speed and caps their potential for real-time applications.To overcome this limitation, we propose consistency models, a new family ofgenerative models that achieve high sample quality without adversarialtraining. They support fast one-step generation by design, while still allowingfor few-step sampling to trade compute for sample quality. They also supportzero-shot data editing, like image inpainting, colorization, andsuper-resolution, without requiring explicit training on these tasks.Consistency models can be trained either as a way to distill pre-traineddiffusion models, or as standalone generative models. Through extensiveexperiments, we demonstrate that they outperform existing distillationtechniques for diffusion models in one- and few-step generation. For example,we achieve the new state-of-the-art FID of 3.55 on CIFAR-10 and 6.20 onImageNet 64x64 for one-step generation. When trained as standalone generativemodels, consistency models also outperform single-step, non-adversarialgenerative models on standard benchmarks like CIFAR-10, ImageNet 64x64 and LSUN256x256.

Quick Read (beta)

loading the full paper ...