Abstract
Transformer model architectures have garnered immense interest lately due totheir effectiveness across a range of domains like language, vision andreinforcement learning. In the field of natural language processing forexample, Transformers have become an indispensable staple in the modern deeplearning stack. Recently, a dizzying number of \emph{"X-former"} models havebeen proposed - Reformer, Linformer, Performer, Longformer, to name a few -which improve upon the original Transformer architecture, many of which makeimprovements around computational and memory \emph{efficiency}. With the aim ofhelping the avid researcher navigate this flurry, this paper characterizes alarge and thoughtful selection of recent efficiency-flavored "X-former" models,providing an organized and comprehensive overview of existing work and modelsacross multiple domains.