Self-Supervised Reinforcement Learning that Transfers using Random Features

Abstract

Model-free reinforcement learning algorithms have exhibited great potentialin solving single-task sequential decision-making problems withhigh-dimensional observations and long horizons, but are known to be hard togeneralize across tasks. Model-based RL, on the other hand, learnstask-agnostic models of the world that naturally enables transfer acrossdifferent reward functions, but struggles to scale to complex environments dueto the compounding error. To get the best of both worlds, we propose aself-supervised reinforcement learning method that enables the transfer ofbehaviors across tasks with different rewards, while circumventing thechallenges of model-based RL. In particular, we show self-supervisedpre-training of model-free reinforcement learning with a number of randomfeatures as rewards allows implicit modeling of long-horizon environmentdynamics. Then, planning techniques like model-predictive control using theseimplicit models enable fast adaptation to problems with new reward functions.Our method is self-supervised in that it can be trained on offline datasetswithout reward labels, but can then be quickly deployed on new tasks. Wevalidate that our proposed method enables transfer across tasks on a variety ofmanipulation and locomotion domains in simulation, opening the door togeneralist decision-making agents.

Quick Read (beta)

loading the full paper ...