Skew-Fit: State-Covering Self-Supervised Reinforcement Learning

Abstract

In standard reinforcement learning, each new skill requires amanually-designed reward function, which takes considerable manual effort andengineering. Self-supervised goal setting has the potential to automate thisprocess, enabling an agent to propose its own goals and acquire skills thatachieve these goals. However, such methods typically rely on manually-designedgoal distributions, or heuristics to force the agent to explore a wide range ofstates. We propose a formal exploration objective for goal-reaching policiesthat maximizes state coverage. We show that this objective is equivalent tomaximizing the entropy of the goal distribution together with goal reachingperformance, where goals correspond to entire states. We present an algorithmcalled Skew-Fit for learning such a maximum-entropy goal distribution, and showthat under certain regularity conditions, our method converges to a uniformdistribution over the set of possible states, even when we do not know this setbeforehand. Skew-Fit enables self-supervised agents to autonomously choose andpractice diverse goals. Our experiments show that it can learn a variety ofmanipulation tasks from images, including opening a door with a real robot,entirely from scratch and without any manually-designed reward function.

Quick Read (beta)

loading the full paper ...