Efficient Transformers: A Survey

  • 2020-09-14 20:38:14
  • Yi Tay, Mostafa Dehghani, Dara Bahri, Donald Metzler
  • 159

Abstract

Transformer model architectures have garnered immense interest lately due totheir effectiveness across a range of domains like language, vision andreinforcement learning. In the field of natural language processing forexample, Transformers have become an indispensable staple in the modern deeplearning stack. Recently, a dizzying number of \emph{"X-former"} models havebeen proposed - Reformer, Linformer, Performer, Longformer, to name a few -which improve upon the original Transformer architecture, many of which makeimprovements around computational and memory \emph{efficiency}. With the aim ofhelping the avid researcher navigate this flurry, this paper characterizes alarge and thoughtful selection of recent efficiency-flavored "X-former" models,providing an organized and comprehensive overview of existing work and modelsacross multiple domains.

 

Quick Read (beta)

loading the full paper ...