Accelerating Diffusion Large Language Models with SlowFast Sampling: The Three Golden Principles

Abstract

Diffusion-based language models (dLLMs) have emerged as a promisingalternative to traditional autoregressive LLMs by enabling parallel tokengeneration and significantly reducing inference latency. However, existingsampling strategies for dLLMs, such as confidence-based or semi-autoregressivedecoding, often suffer from static behavior, leading to suboptimal efficiencyand limited flexibility. In this paper, we propose SlowFast Sampling, a noveldynamic sampling strategy that adaptively alternates between exploratory andaccelerated decoding stages. Our method is guided by three golden principles:certainty principle, convergence principle, and positional principle, whichgovern when and where tokens can be confidently and efficiently decoded. Wefurther integrate our strategy with dLLM-Cache to reduce redundant computation.Extensive experiments across benchmarks and models show that SlowFast Samplingachieves up to 15.63$\times$ speedup on LLaDA with minimal accuracy drop, andup to 34.22$\times$ when combined with caching. Notably, our approachoutperforms strong autoregressive baselines like LLaMA3 8B in throughput,demonstrating that well-designed sampling can unlock the full potential ofdLLMs for fast and high-quality generation.

Quick Read (beta)

loading the full paper ...