Abstract
Length generalization, the ability to solve problems of longer sequences thanthose observed during training, poses a core challenge of Transformer-basedlarge language models (LLM). Although existing studies have predominantlyfocused on data-driven approaches for arithmetic operations and symbolicmanipulation tasks, these approaches tend to be task-specific with limitedoverall performance. To pursue a more general solution, this paper focuses on abroader case of reasoning problems that are computable, i.e., problems thatalgorithms can solve, thus can be solved by the Turing Machine. From thisperspective, this paper proposes Turing MAchine Imitation Learning (TAIL) toimprove the length generalization ability of LLMs. TAIL synthesizeschain-of-thoughts (CoT) data that imitate the execution process of a TuringMachine by computer programs, which linearly expands the reasoning steps intoatomic states to alleviate shortcut learning and explicit memory fetchmechanism to reduce the difficulties of dynamic and long-range data access inelementary operations. To validate the reliability and universality of TAIL, weconstruct a challenging synthetic dataset covering 8 classes of algorithms and18 tasks. Without bells and whistles, TAIL significantly improves the lengthgeneralization ability as well as the performance of Qwen2.5-7B on varioustasks using only synthetic data, surpassing previous methods and DeepSeek-R1.The experimental results reveal that the key concepts in the Turing Machine,instead of the thinking styles, are indispensable for TAIL for lengthgeneralization, through which the model exhibits read-and-write behaviorsconsistent with the properties of the Turing Machine in their attention layers.This work provides a promising direction for future research in the learning ofLLM reasoning from synthetic data.