Visuomotor Control in Multi-Object Scenes Using Object-Aware Representations

Abstract

Perceptual understanding of the scene and the relationship between itsdifferent components is important for successful completion of robotic tasks.Representation learning has been shown to be a powerful technique for this, butmost of the current methodologies learn task specific representations that donot necessarily transfer well to other tasks. Furthermore, representationslearned by supervised methods require large labeled datasets for each task thatare expensive to collect in the real world. Using self-supervised learning toobtain representations from unlabeled data can mitigate this problem. However,current self-supervised representation learning methods are mostly objectagnostic, and we demonstrate that the resulting representations areinsufficient for general purpose robotics tasks as they fail to capture thecomplexity of scenes with many components. In this paper, we explore theeffectiveness of using object-aware representation learning techniques forrobotic tasks. Our self-supervised representations are learned by observing theagent freely interacting with different parts of the environment and is queriedin two different settings: (i) policy learning and (ii) object locationprediction. We show that our model learns control policies in asample-efficient manner and outperforms state-of-the-art object agnostictechniques as well as methods trained on raw RGB images. Our results show a 20percent increase in performance in low data regimes (1000 trajectories) inpolicy training using implicit behavioral cloning (IBC). Furthermore, ourmethod outperforms the baselines for the task of object localization inmulti-object scenes.

Quick Read (beta)

loading the full paper ...