Abstract
We present SinDiffusion, leveraging denoising diffusion models to captureinternal distribution of patches from a single natural image. SinDiffusionsignificantly improves the quality and diversity of generated samples comparedwith existing GAN-based approaches. It is based on two core designs. First,SinDiffusion is trained with a single model at a single scale instead ofmultiple models with progressive growing of scales which serves as the defaultsetting in prior work. This avoids the accumulation of errors, which causecharacteristic artifacts in generated results. Second, we identify that apatch-level receptive field of the diffusion network is crucial and effectivefor capturing the image's patch statistics, therefore we redesign the networkstructure of the diffusion model. Coupling these two designs enables us togenerate photorealistic and diverse images from a single image. Furthermore,SinDiffusion can be applied to various applications, i.e., text-guided imagegeneration, and image outpainting, due to the inherent capability of diffusionmodels. Extensive experiments on a wide range of images demonstrate thesuperiority of our proposed method for modeling the patch distribution.