Abstract
Diffusion models have demonstrated impressive image generation performance,and have been used in various computer vision tasks. Unfortunately, imagegeneration using diffusion models is very time-consuming since it requiresthousands of sampling steps. To address this problem, here we present a novelpyramidal diffusion model to generate high resolution images starting from muchcoarser resolution images using a single score function trained with apositional embedding. This enables a time-efficient sampling for imagegeneration, and also solves the low batch size problem when training withlimited resources. Furthermore, we show that the proposed approach can beefficiently used for multi-scale super-resolution problem using a single scorefunction.