Persistent Reinforcement Learning via Subgoal Curricula

Abstract

Reinforcement learning (RL) promises to enable autonomous acquisition ofcomplex behaviors for diverse agents. However, the success of currentreinforcement learning algorithms is predicated on an often under-emphasisedrequirement -- each trial needs to start from a fixed initial statedistribution. Unfortunately, resetting the environment to its initial stateafter each trial requires substantial amount of human supervision and extensiveinstrumentation of the environment which defeats the purpose of autonomousreinforcement learning. In this work, we propose Value-accelerated PersistentReinforcement Learning (VaPRL), which generates a curriculum of initial statessuch that the agent can bootstrap on the success of easier tasks to efficientlylearn harder tasks. The agent also learns to reach the initial states proposedby the curriculum, minimizing the reliance on human interventions into thelearning. We observe that VaPRL reduces the interventions required by threeorders of magnitude compared to episodic RL while outperforming priorstate-of-the art methods for reset-free RL both in terms of sample efficiencyand asymptotic performance on a variety of simulated robotics problems.

Quick Read (beta)

loading the full paper ...