Grasp2Vec: Learning Object Representations from Self-Supervised Grasping

Abstract

Well structured visual representations can make robot learning faster and canimprove generalization. In this paper, we study how we can acquire effectiveobject-centric representations for robotic manipulation tasks without humanlabeling by using autonomous robot interaction with the environment. Suchrepresentation learning methods can benefit from continuous refinement of therepresentation as the robot collects more experience, allowing them to scaleeffectively without human intervention. Our representation learning approach isbased on object persistence: when a robot removes an object from a scene, therepresentation of that scene should change according to the features of theobject that was removed. We formulate an arithmetic relationship betweenfeature vectors from this observation, and use it to learn a representation ofscenes and objects that can then be used to identify object instances, localizethem in the scene, and perform goal-directed grasping tasks where the robotmust retrieve commanded objects from a bin. The same grasping procedure canalso be used to automatically collect training data for our method, byrecording images of scenes, grasping and removing an object, and recording theoutcome. Our experiments demonstrate that this self-supervised approach fortasked grasping substantially outperforms direct reinforcement learning fromimages and prior representation learning methods.

Quick Read (beta)

loading the full paper ...