Vintix: Action Model via In-Context Reinforcement Learning

Abstract

In-Context Reinforcement Learning (ICRL) represents a promising paradigm fordeveloping generalist agents that learn at inference time throughtrial-and-error interactions, analogous to how large language models adaptcontextually, but with a focus on reward maximization. However, the scalabilityof ICRL beyond toy tasks and single-domain settings remains an open challenge.In this work, we present the first steps toward scaling ICRL by introducing afixed, cross-domain model capable of learning behaviors through in-contextreinforcement learning. Our results demonstrate that Algorithm Distillation, aframework designed to facilitate ICRL, offers a compelling and competitivealternative to expert distillation to construct versatile action models. Thesefindings highlight the potential of ICRL as a scalable approach for generalistdecision-making systems. Code to be released athttps://github.com/dunnolab/vintix

Quick Read (beta)

loading the full paper ...