Two-Scale Latent Dynamics for Recurrent-Depth Transformers

  • 2025-11-13 16:51:26
  • Francesco Pappone, Donato Crisostomi, Emanuele RodolĂ 
  • 0

Abstract

Recurrent-depth transformers scale test-time compute by iterating latent computations before emitting tokens. We study the geometry of these iterates and argue for a simple, two-scale operational picture: (i) within a looped block, updates act as small-scale refinements; (ii) across consecutive blocks, states undergo a larger-scale drift. Across training, our measurements show that loop steps become smaller and increasingly orthogonal to one another, indicating better local modeling of fine structure rather than merely pushing in a single direction. These dynamics motivate an early-exit mechanism based on the model's second-order difference in step-size, which we show is superior in terms of performance, stability and time-efficiency, when compared to the KL-divergence exit strategy of Geiping et al. and its naive first-order counterpart.

 

Quick Read (beta)

loading the full paper ...