Emergent Stack Representations in Modeling Counter Languages Using Transformers

Abstract

Transformer architectures are the backbone of most modern language models,but understanding the inner workings of these models still largely remains anopen problem. One way that research in the past has tackled this problem is byisolating the learning capabilities of these architectures by training themover well-understood classes of formal languages. We extend this literature byanalyzing models trained over counter languages, which can be modeled usingcounter variables. We train transformer models on 4 counter languages, andequivalently formulate these languages using stacks, whose depths can beunderstood as the counter values. We then probe their internal representationsfor stack depths at each input token to show that these models when trained asnext token predictors learn stack-like representations. This brings us closerto understanding the algorithmic details of how transformers learn languagesand helps in circuit discovery.

Quick Read (beta)

loading the full paper ...