Abstract
By transferring knowledge from large, diverse, task-agnostic datasets, modernmachine learning models can solve specific downstream tasks either zero-shot orwith small task-specific datasets to a high level of performance. While thiscapability has been demonstrated in other fields such as computer vision,natural language processing or speech recognition, it remains to be shown inrobotics, where the generalization capabilities of the models are particularlycritical due to the difficulty of collecting real-world robotic data. We arguethat one of the keys to the success of such general robotic models lies withopen-ended task-agnostic training, combined with high-capacity architecturesthat can absorb all of the diverse, robotic data. In this paper, we present amodel class, dubbed Robotics Transformer, that exhibits promising scalablemodel properties. We verify our conclusions in a study of different modelclasses and their ability to generalize as a function of the data size, modelsize, and data diversity based on a large-scale data collection on real robotsperforming real-world tasks. The project's website and videos can be found atrobotics-transformer1.github.io