QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation

Abstract

In this paper, we study the problem of learning vision-based dynamicmanipulation skills using a scalable reinforcement learning approach. We studythis problem in the context of grasping, a longstanding challenge in roboticmanipulation. In contrast to static learning behaviors that choose a grasppoint and then execute the desired grasp, our method enables closed-loopvision-based control, whereby the robot continuously updates its grasp strategybased on the most recent observations to optimize long-horizon grasp success.To that end, we introduce QT-Opt, a scalable self-supervised vision-basedreinforcement learning framework that can leverage over 580k real-world graspattempts to train a deep neural network Q-function with over 1.2M parameters toperform closed-loop, real-world grasping that generalizes to 96% grasp successon unseen objects. Aside from attaining a very high success rate, our methodexhibits behaviors that are quite distinct from more standard grasping systems:using only RGB vision-based perception from an over-the-shoulder camera, ourmethod automatically learns regrasping strategies, probes objects to find themost effective grasps, learns to reposition objects and perform othernon-prehensile pre-grasp manipulations, and responds dynamically todisturbances and perturbations.

Quick Read (beta)

loading the full paper ...