Extracting Contact and Motion from Manipulation Videos

Abstract

When we physically interact with our environment using our hands, we touchobjects and force them to move: contact and motion are defining properties ofmanipulation. In this paper, we present an active, bottom-up method for thedetection of actor-object contacts and the extraction of moved objects andtheir motions in RGBD videos of manipulation actions. At the core of ourapproach lies non-rigid registration: we continuously warp a point cloud modelof the observed scene to the current video frame, generating a set of dense 3Dpoint trajectories. Under loose assumptions, we employ simple point cloudsegmentation techniques to extract the actor and subsequently detectactor-environment contacts based on the estimated trajectories. For each suchinteraction, using the detected contact as an attention mechanism, we obtain aninitial motion segment for the manipulated object by clustering trajectories inthe contact area vicinity and then we jointly refine the object segment andestimate its 6DOF pose in all observed frames. Because of its generality andthe fundamental, yet highly informative, nature of its outputs, our approach isapplicable to a wide range of perception and planning tasks. We qualitativelyevaluate our method on a number of input sequences and present a comprehensiverobot imitation learning example, in which we demonstrate the crucial role ofour outputs in developing action representations/plans from observation.

Quick Read (beta)

loading the full paper ...