Structured State Space Models for In-Context Reinforcement Learning

Abstract

Structured state space sequence (S4) models have recently achievedstate-of-the-art performance on long-range sequence modeling tasks. Thesemodels also have fast inference speeds and parallelisable training, making thempotentially useful in many reinforcement learning settings. We propose amodification to a variant of S4 that enables us to initialise and reset thehidden state in parallel, allowing us to tackle reinforcement learning tasks.We show that our modified architecture runs asymptotically faster thanTransformers and performs better than LSTM models on a simple memory-basedtask. Then, by leveraging the model's ability to handle long-range sequences,we achieve strong performance on a challenging meta-learning task in which theagent is given a randomly-sampled continuous control environment, combined witha randomly-sampled linear projection of the environment's observations andactions. Furthermore, we show the resulting model can adapt toout-of-distribution held-out tasks. Overall, the results presented in thispaper suggest that the S4 models are a strong contender for the defaultarchitecture used for in-context reinforcement learning

Quick Read (beta)

loading the full paper ...