Latent Plans for Task-Agnostic Offline Reinforcement Learning

Abstract

Everyday tasks of long-horizon and comprising a sequence of multiple implicitsubtasks still impose a major challenge in offline robot control. While anumber of prior methods aimed to address this setting with variants ofimitation and offline reinforcement learning, the learned behavior is typicallynarrow and often struggles to reach configurable long-horizon goals. As bothparadigms have complementary strengths and weaknesses, we propose a novelhierarchical approach that combines the strengths of both methods to learntask-agnostic long-horizon policies from high-dimensional camera observations.Concretely, we combine a low-level policy that learns latent skills viaimitation learning and a high-level policy learned from offline reinforcementlearning for skill-chaining the latent behavior priors. Experiments in varioussimulated and real robot control tasks show that our formulation enablesproducing previously unseen combinations of skills to reach temporally extendedgoals by "stitching" together latent skills through goal chaining with anorder-of-magnitude improvement in performance upon state-of-the-art baselines.We even learn one multi-task visuomotor policy for 25 distinct manipulationtasks in the real world which outperforms both imitation learning and offlinereinforcement learning techniques.

Quick Read (beta)

loading the full paper ...