Abstract
Diffusion large language models (dLLMs) generate text through iterativedenoising, yet current decoding strategies discard rich intermediatepredictions in favor of the final output. Our work here reveals a criticalphenomenon, temporal oscillation, where correct answers often emerge in themiddle process, but are overwritten in later denoising steps. To address thisissue, we introduce two complementary methods that exploit temporalconsistency: 1) Temporal Self-Consistency Voting, a training-free, test-timedecoding strategy that aggregates predictions across denoising steps to selectthe most consistent output; and 2) a post-training method termed TemporalConsistency Reinforcement, which uses Temporal Semantic Entropy (TSE), ameasure of semantic stability across intermediate predictions, as a rewardsignal to encourage stable generations. Empirical results across multiplebenchmarks demonstrate the effectiveness of our approach. Using the negativeTSE reward alone, we observe a remarkable average improvement of 24.7% on theCountdown dataset over an existing dLLM. Combined with the accuracy reward, weachieve absolute gains of 2.0% on GSM8K, 4.3% on MATH500, 6.6% on SVAMP, and25.3% on Countdown, respectively. Our findings underscore the untappedpotential of temporal dynamics in dLLMs and offer two simple yet effectivetools to harness them.