On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding

  • 2025-05-02 18:13:48
  • Kevin Xu, Issei Sato
  • 0

Abstract

Looped Transformers provide advantages in parameter efficiency, computationalcapabilities, and generalization for reasoning tasks. However, their expressivepower regarding function approximation remains underexplored. In this paper, weestablish the approximation rate of Looped Transformers by defining the modulusof continuity for sequence-to-sequence functions. This reveals a limitationspecific to the looped architecture. That is, the analysis prompts theincorporation of scaling parameters for each loop, conditioned on timestepencoding. Experiments validate the theoretical results, showing that increasingthe number of loops enhances performance, with further gains achieved throughthe timestep encoding.

 

Quick Read (beta)

loading the full paper ...