Abstract
Foundation models have shown impressive adaptation and scalability insupervised and self-supervised learning problems, but so far these successeshave not fully translated to reinforcement learning (RL). In this work, wedemonstrate that training an RL agent at scale leads to a general in-contextlearning algorithm that can adapt to open-ended novel embodied 3D problems asquickly as humans. In a vast space of held-out environment dynamics, ouradaptive agent (AdA) displays on-the-fly hypothesis-driven exploration,efficient exploitation of acquired knowledge, and can successfully be promptedwith first-person demonstrations. Adaptation emerges from three ingredients:(1) meta-reinforcement learning across a vast, smooth and diverse taskdistribution, (2) a policy parameterised as a large-scale attention-basedmemory architecture, and (3) an effective automated curriculum that prioritisestasks at the frontier of an agent's capabilities. We demonstrate characteristicscaling laws with respect to network size, memory length, and richness of thetraining task distribution. We believe our results lay the foundation forincreasingly general and adaptive RL agents that perform well acrossever-larger open-ended domains.