Investigating Recurrent Transformers with Dynamic Halt

Abstract

In this paper, we study the inductive biases of two major approaches toaugmenting Transformers with a recurrent mechanism - (1) the approach ofincorporating a depth-wise recurrence similar to Universal Transformers; and(2) the approach of incorporating a chunk-wise temporal recurrence likeTemporal Latent Bottleneck. Furthermore, we propose and investigate novel waysto extend and combine the above methods - for example, we propose a globalmean-based dynamic halting mechanism for Universal Transformer and anaugmentation of Temporal Latent Bottleneck with elements from UniversalTransformer. We compare the models and probe their inductive biases in severaldiagnostic tasks such as Long Range Arena (LRA), flip-flop language modeling,ListOps, and Logical Inference.

Quick Read (beta)

loading the full paper ...