How to Spend Your Robot Time: Bridging Kickstarting and Offline Reinforcement Learning for Vision-based Robotic Manipulation

Abstract

Reinforcement learning (RL) has been shown to be effective at learningcontrol from experience. However, RL typically requires a large amount ofonline interaction with the environment. This limits its applicability toreal-world settings, such as in robotics, where such interaction is expensive.In this work we investigate ways to minimize online interactions in a targettask, by reusing a suboptimal policy we might have access to, for example fromtraining on related prior tasks, or in simulation. To this end, we develop twoRL algorithms that can speed up training by using not only the actiondistributions of teacher policies, but also data collected by such policies onthe task at hand. We conduct a thorough experimental study of how to usesuboptimal teachers on a challenging robotic manipulation benchmark onvision-based stacking with diverse objects. We compare our methods to offline,online, offline-to-online, and kickstarting RL algorithms. By doing so, we findthat training on data from both the teacher and student, enables the bestperformance for limited data budgets. We examine how to best allocate a limiteddata budget -- on the target task -- between the teacher and the studentpolicy, and report experiments using varying budgets, two teachers withdifferent degrees of suboptimality, and five stacking tasks that require adiverse set of behaviors. Our analysis, both in simulation and in the realworld, shows that our approach is the best across data budgets, while standardoffline RL from teacher rollouts is surprisingly effective when enough data isgiven.

Quick Read (beta)

loading the full paper ...