Training-free Diffusion Acceleration with Bottleneck Sampling

Abstract

Diffusion models have demonstrated remarkable capabilities in visual contentgeneration but remain challenging to deploy due to their high computationalcost during inference. This computational burden primarily arises from thequadratic complexity of self-attention with respect to image or videoresolution. While existing acceleration methods often compromise output qualityor necessitate costly retraining, we observe that most diffusion models arepre-trained at lower resolutions, presenting an opportunity to exploit theselow-resolution priors for more efficient inference without degradingperformance. In this work, we introduce Bottleneck Sampling, a training-freeframework that leverages low-resolution priors to reduce computational overheadwhile preserving output fidelity. Bottleneck Sampling follows a high-low-highdenoising workflow: it performs high-resolution denoising in the initial andfinal stages while operating at lower resolutions in intermediate steps. Tomitigate aliasing and blurring artifacts, we further refine the resolutiontransition points and adaptively shift the denoising timesteps at each stage.We evaluate Bottleneck Sampling on both image and video generation tasks, whereextensive experiments demonstrate that it accelerates inference by up to3$\times$ for image generation and 2.5$\times$ for video generation, all whilemaintaining output quality comparable to the standard full-resolution samplingprocess across multiple evaluation metrics. Code is available at:https://github.com/tyfeld/Bottleneck-Sampling

Quick Read (beta)

loading the full paper ...