Looped Transformers for Length Generalization

Abstract

Recent work has shown that Transformers trained from scratch can successfullysolve various arithmetic and algorithmic tasks, such as adding numbers andcomputing parity. While these Transformers generalize well on unseen inputs ofthe same length, they struggle with length generalization, i.e., handlinginputs of unseen lengths. In this work, we demonstrate that looped Transformerswith an adaptive number of steps significantly improve length generalization.We focus on tasks with a known iterative solution, involving multipleiterations of a RASP-L operation - a length-generalizable operation that can beexpressed by a finite-sized Transformer. We train looped Transformers using ourproposed learning algorithm and observe that they learn highlylength-generalizable solutions for various tasks.

Quick Read (beta)

loading the full paper ...