AGaLiTe: Approximate Gated Linear Transformers for Online Reinforcement Learning

Abstract

In this paper we investigate transformer architectures designed for partiallyobservable online reinforcement learning. The self-attention mechanism in thetransformer architecture is capable of capturing long-range dependencies and itis the main reason behind its effectiveness in processing sequential data.Nevertheless, despite their success, transformers have two significantdrawbacks that still limit their applicability in online reinforcementlearning: (1) in order to remember all past information, the self-attentionmechanism requires access to the whole history to be provided as context. (2)The inference cost in transformers is expensive. In this paper, we introducerecurrent alternatives to the transformer self-attention mechanism that offercontext-independent inference cost, leverage long-range dependencieseffectively, and performs well in online reinforcement learning task. Wequantify the impact of the different components of our architecture in adiagnostic environment and assess performance gains in 2D and 3D pixel-basedpartially-observable environments (e.g. T-Maze, Mystery Path, Craftax, andMemory Maze). Compared with a state-of-the-art architecture, GTrXL, inferencein our approach is at least 40% cheaper while reducing memory use more than50%. Our approach either performs similarly or better than GTrXL, improvingmore than 37% upon GTrXL performance in harder tasks.

Quick Read (beta)

loading the full paper ...