Learning Actionable Representations from Visual Observations

Abstract

In this work we explore a new approach for robots to teach themselves aboutthe world simply by observing it. In particular we investigate theeffectiveness of learning task-agnostic representations for continuous controltasks. We extend Time-Contrastive Networks (TCN) that learn from visualobservations by embedding multiple frames jointly in the embedding space asopposed to a single frame. We show that by doing so, we are now able to encodeboth position and velocity attributes significantly more accurately. We testthe usefulness of this self-supervised approach in a reinforcement learningsetting. We show that the representations learned by agents observingthemselves take random actions, or other agents perform tasks successfully, canenable the learning of continuous control policies using algorithms likeProximal Policy Optimization (PPO) using only the learned embeddings as input.We also demonstrate significant improvements on the real-world Pouring datasetwith a relative error reduction of 39.4% for motion attributes and 11.1% forstatic attributes compared to the single-frame baseline. Video results areavailable at https://sites.google.com/view/actionablerepresentations .

Quick Read (beta)

loading the full paper ...