Transformers do not scale very well to long sequence lengths largely becauseof quadratic self-attention complexity. In the recent months, a wide spectrumof efficient, fast Transformers have been proposed to tackle this problem, moreoften than not claiming superior or comparable model quality to vanillaTransformer models. To this date, there is no well-established consensus on howto evaluate this class of models. Moreover, inconsistent benchmarking on a widespectrum of tasks and datasets makes it difficult to assess relative modelquality amongst many models. This paper proposes a systematic and unifiedbenchmark, LRA, specifically focused on evaluating model quality underlong-context scenarios. Our benchmark is a suite of tasks consisting ofsequences ranging from $1K$ to $16K$ tokens, encompassing a wide range of datatypes and modalities such as text, natural, synthetic images, and mathematicalexpressions requiring similarity, structural, and visual-spatial reasoning. Wesystematically evaluate ten well-established long-range Transformer models(Reformers, Linformers, Linear Transformers, Sinkhorn Transformers, Performers,Synthesizers, Sparse Transformers, and Longformers) on our newly proposedbenchmark suite. LRA paves the way towards better understanding this class ofefficient Transformer models, facilitates more research in this direction, andpresents new challenging tasks to tackle. Our benchmark code will be releasedat https://github.com/google-research/long-range-arena.