Accelerate High-Quality Diffusion Models with Inner Loop Feedback

Abstract

We propose Inner Loop Feedback (ILF), a novel approach to acceleratediffusion models' inference. ILF trains a lightweight module to predict futurefeatures in the denoising process by leveraging the outputs from a chosendiffusion backbone block at a given time step. This approach exploits two keyintuitions; (1) the outputs of a given block at adjacent time steps aresimilar, and (2) performing partial computations for a step imposes a lowerburden on the model than skipping the step entirely. Our method is highlyflexible, since we find that the feedback module itself can simply be a blockfrom the diffusion backbone, with all settings copied. Its influence on thediffusion forward can be tempered with a learnable scaling factor from zeroinitialization. We train this module using distillation losses; however, unlikesome prior work where a full diffusion backbone serves as the student, ourmodel freezes the backbone, training only the feedback module. While manyefforts to optimize diffusion models focus on achieving acceptable imagequality in extremely few steps (1-4 steps), our emphasis is on matching bestcase results (typically achieved in 20 steps) while significantly reducingruntime. ILF achieves this balance effectively, demonstrating strongperformance for both class-to-image generation with diffusion transformer (DiT)and text-to-image generation with DiT-based PixArt-alpha and PixArt-sigma. Thequality of ILF's 1.7x-1.8x speedups are confirmed by FID, CLIP score, CLIPImage Quality Assessment, ImageReward, and qualitative comparisons. Projectinformation is available at https://mgwillia.github.io/ilf.

Quick Read (beta)

loading the full paper ...