Diffusion Models in Vision: A Survey

Abstract

Denoising diffusion models represent a recent emerging topic in computervision, demonstrating remarkable results in the area of generative modeling. Adiffusion model is a deep generative model that is based on two stages, aforward diffusion stage and a reverse diffusion stage. In the forward diffusionstage, the input data is gradually perturbed over several steps by addingGaussian noise. In the reverse stage, a model is tasked at recovering theoriginal input data by learning to gradually reverse the diffusion process,step by step. Diffusion models are widely appreciated for the quality anddiversity of the generated samples, despite their known computational burdens,i.e. low speeds due to the high number of steps involved during sampling. Inthis survey, we provide a comprehensive review of articles on denoisingdiffusion models applied in vision, comprising both theoretical and practicalcontributions in the field. First, we identify and present three genericdiffusion modeling frameworks, which are based on denoising diffusionprobabilistic models, noise conditioned score networks, and stochasticdifferential equations. We further discuss the relations between diffusionmodels and other deep generative models, including variational auto-encoders,generative adversarial networks, energy-based models, autoregressive models andnormalizing flows. Then, we introduce a multi-perspective categorization ofdiffusion models applied in computer vision. Finally, we illustrate the currentlimitations of diffusion models and envision some interesting directions forfuture research.

Quick Read (beta)

loading the full paper ...