Dynamics-Aware Unsupervised Discovery of Skills

Abstract

Conventionally, model-based reinforcement learning (MBRL) aims to learn aglobal model for the dynamics of the environment. A good model can potentiallyenable planning algorithms to generate a large variety of behaviors and solvediverse tasks. However, learning an accurate model for complex dynamicalsystems is difficult, and even then, the model might not generalize welloutside the distribution of states on which it was trained. In this work, wecombine model-based learning with model-free learning of primitives that makemodel-based planning easy. To that end, we aim to answer the question: how canwe discover skills whose outcomes are easy to predict? We propose anunsupervised learning algorithm, Dynamics-Aware Discovery of Skills (DADS),which simultaneously discovers predictable behaviors and learns their dynamics.Our method can leverage continuous skill spaces, theoretically, allowing us tolearn infinitely many behaviors even for high-dimensional state-spaces. Wedemonstrate that zero-shot planning in the learned latent space significantlyoutperforms standard MBRL and model-free goal-conditioned RL, can handlesparse-reward tasks, and substantially improves over prior hierarchical RLmethods for unsupervised skill discovery.

Quick Read (beta)

loading the full paper ...