Abstract
Transformers play a central role in the inner workings of large languagemodels. We develop a mathematical framework for analyzing Transformers based ontheir interpretation as interacting particle systems, which reveals thatclusters emerge in long time. Our study explores the underlying theory andoffers new perspectives for mathematicians as well as computer scientists.
Quick Read (beta)
loading the full paper ...