PASTA: Pretrained Action-State Transformer Agents

Abstract

Self-supervised learning has brought about a revolutionary paradigm shift invarious computing domains, including NLP, vision, and biology. Recentapproaches involve pre-training transformer models on vast amounts of unlabeleddata, serving as a starting point for efficiently solving downstream tasks. Inreinforcement learning, researchers have recently adapted these approaches,developing models pre-trained on expert trajectories. This advancement enablesthe models to tackle a broad spectrum of tasks, ranging from robotics torecommendation systems. However, existing methods mostly rely on intricatepre-training objectives tailored to specific downstream applications. Thispaper conducts a comprehensive investigation of models, referred to aspre-trained action-state transformer agents (PASTA). Our study covers a unifiedmethodology and covers an extensive set of general downstream tasks includingbehavioral cloning, offline RL, sensor failure robustness, and dynamics changeadaptation. Our objective is to systematically compare various design choicesand offer valuable insights that will aid practitioners in developing robustmodels. Key highlights of our study include tokenization at the component levelfor actions and states, the use of fundamental pre-training objectives such asnext token prediction or masked language modeling, simultaneous training ofmodels across multiple domains, and the application of various fine-tuningstrategies. In this study, the developed models contain fewer than 7 millionparameters allowing a broad community to use these models and reproduce ourexperiments. We hope that this study will encourage further research into theuse of transformers with first principle design choices to represent RLtrajectories and contribute to robust policy learning.

Quick Read (beta)

loading the full paper ...