Wide-In, Narrow-Out: Revokable Decoding for Efficient and Effective DLLMs

Abstract

Diffusion Large Language Models (DLLMs) have emerged as a compellingalternative to Autoregressive models, designed for fast parallel generation.However, existing DLLMs are plagued by a severe quality-speed trade-off, wherefaster parallel decoding leads to significant performance degradation. Weattribute this to the irreversibility of standard decoding in DLLMs, which iseasily polarized into the wrong decoding direction along with early errorcontext accumulation. To resolve this, we introduce Wide-In, Narrow-Out (WINO),a training-free decoding algorithm that enables revokable decoding in DLLMs.WINO employs a parallel draft-and-verify mechanism, aggressively draftingmultiple tokens while simultaneously using the model's bidirectional context toverify and re-mask suspicious ones for refinement. Verified in open-sourceDLLMs like LLaDA and MMaDA, WINO is shown to decisively improve thequality-speed trade-off. For instance, on the GSM8K math benchmark, itaccelerates inference by 6$\times$ while improving accuracy by 2.58%; onFlickr30K captioning, it achieves a 10$\times$ speedup with higher performance.More comprehensive experiments are conducted to demonstrate the superiority andprovide an in-depth understanding of WINO.

Quick Read (beta)

loading the full paper ...