DisCo RL: Distribution-Conditioned Reinforcement Learning for General-Purpose Policies

Abstract

Can we use reinforcement learning to learn general-purpose policies that canperform a wide range of different tasks, resulting in flexible and reusableskills? Contextual policies provide this capability in principle, but therepresentation of the context determines the degree of generalization andexpressivity. Categorical contexts preclude generalization to entirely newtasks. Goal-conditioned policies may enable some generalization, but cannotcapture all tasks that might be desired. In this paper, we propose goaldistributions as a general and broadly applicable task representation suitablefor contextual policies. Goal distributions are general in the sense that theycan represent any state-based reward function when equipped with an appropriatedistribution class, while the particular choice of distribution class allows usto trade off expressivity and learnability. We develop an off-policy algorithmcalled distribution-conditioned reinforcement learning (DisCo RL) toefficiently learn these policies. We evaluate DisCo RL on a variety of robotmanipulation tasks and find that it significantly outperforms prior methods ontasks that require generalization to new goal distributions.

Quick Read (beta)

loading the full paper ...