Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence

  • 2025-11-10 18:43:07
  • Sean McLeish, Ang Li, John Kirchenbauer, Dayal Singh Kalra, Brian R. Bartoldson, Bhavya Kailkhura, Avi Schwarzschild, Jonas Geiping, Tom Goldstein, Micah Goldblum
  • 0

Abstract

Recent advances in depth-recurrent language models show that recurrence candecouple train-time compute and parameter count from test-time compute. In thiswork, we study how to convert existing pretrained non-recurrent language modelsinto depth-recurrent models. We find that using a curriculum of recurrences toincrease the effective depth of the model over the course of training preservesperformance while reducing total computational cost. In our experiments, onmathematics, we observe that converting pretrained models to recurrent onesresults in better performance at a given compute budget than simplypost-training the original non-recurrent language model.

 

Quick Read (beta)

loading the full paper ...