A Survey on Cache Methods in Diffusion Models: Toward Efficient Multi-Modal Generation

Abstract

Diffusion Models have become a cornerstone of modern generative AI for theirexceptional generation quality and controllability. However, their inherent\textit{multi-step iterations} and \textit{complex backbone networks} lead toprohibitive computational overhead and generation latency, forming a majorbottleneck for real-time applications. Although existing accelerationtechniques have made progress, they still face challenges such as limitedapplicability, high training costs, or quality degradation. Against this backdrop, \textbf{Diffusion Caching} offers a promisingtraining-free, architecture-agnostic, and efficient inference paradigm. Itscore mechanism identifies and reuses intrinsic computational redundancies inthe diffusion process. By enabling feature-level cross-step reuse andinter-layer scheduling, it reduces computation without modifying modelparameters. This paper systematically reviews the theoretical foundations andevolution of Diffusion Caching and proposes a unified framework for itsclassification and analysis. Through comparative analysis of representative methods, we show thatDiffusion Caching evolves from \textit{static reuse} to \textit{dynamicprediction}. This trend enhances caching flexibility across diverse tasks andenables integration with other acceleration techniques such as samplingoptimization and model distillation, paving the way for a unified, efficientinference framework for future multimodal and interactive applications. Weargue that this paradigm will become a key enabler of real-time and efficientgenerative AI, injecting new vitality into both theory and practice of\textit{Efficient Generative Intelligence}.

Quick Read (beta)

loading the full paper ...