Abstract
Robot learning is witnessing a significant increase in the size, diversity,and complexity of pre-collected datasets, mirroring trends in domains such asnatural language processing and computer vision. Many robot learning methodstreat such datasets as multi-task expert data and learn a multi-task,generalist policy by training broadly across them. Notably, while thesegeneralist policies can improve the average performance across many tasks, theperformance of generalist policies on any one task is often suboptimal due tonegative transfer between partitions of the data, compared to task-specificspecialist policies. In this work, we argue for the paradigm of trainingpolicies during deployment given the scenarios they encounter: rather thandeploying pre-trained policies to unseen problems in a zero-shot manner, wenon-parametrically retrieve and train models directly on relevant data at testtime. Furthermore, we show that many robotics tasks share considerable amountsof low-level behaviors and that retrieval at the "sub"-trajectory granularityenables significantly improved data utilization, generalization, and robustnessin adapting policies to novel problems. In contrast, existing full-trajectoryretrieval methods tend to underutilize the data and miss out on sharedcross-task content. This work proposes STRAP, a technique for leveragingpre-trained vision foundation models and dynamic time warping to retrievesub-sequences of trajectories from large training corpora in a robust fashion.STRAP outperforms both prior retrieval algorithms and multi-task learningmethods in simulated and real experiments, showing the ability to scale to muchlarger offline datasets in the real world as well as the ability to learnrobust control policies with just a handful of real-world demonstrations.