Self-Supervised Learning of State Estimation for Manipulating Deformable Linear Objects

Abstract

We demonstrate model-based, visual robot manipulation of linear deformableobjects. Our approach is based on a state-space representation of the physicalsystem that the robot aims to control. This choice has multiple advantages,including the ease of incorporating physical priors in the dynamics model andperception model, and the ease of planning manipulation actions. In addition,physical states can naturally represent object instances of differentappearances. Therefore, dynamics in the state space can be learned in onesetting and directly used in other visually different settings. This is incontrast to dynamics learned in pixel space or latent space, wheregeneralization to visual differences are not guaranteed. Challenges in takingthe state-space approach are the estimation of the high-dimensional state of adeformable object from raw images, where annotations are very expensive on realdata, and finding a dynamics model that is both accurate, generalizable, andefficient to compute. We are the first to demonstrate self-supervised trainingof rope state estimation on real images, without requiring expensiveannotations. This is achieved by our novel differentiable renderer and imageloss, which are generalizable across a wide range of visual appearances. Withestimated rope states, we train a fast and differentiable neural networkdynamics model that encodes the physics of mass-spring systems. Our method hasa higher accuracy in predicting future states compared to models that do notinvolve explicit state estimation and do not use any physics prior. We alsoshow that our approach achieves more efficient manipulation, both in simulationand on a real robot, when used within a model predictive controller.

Quick Read (beta)

loading the full paper ...