Skill-Critic: Refining Learned Skills for Hierarchical Reinforcement Learning

Abstract

Hierarchical reinforcement learning (RL) can accelerate long-horizondecision-making by temporally abstracting a policy into multiple levels.Promising results in sparse reward environments have been seen with skills,i.e. sequences of primitive actions. Typically, a skill latent space and policyare discovered from offline data. However, the resulting low-level policy canbe unreliable due to low-coverage demonstrations or distribution shifts. As asolution, we propose the Skill-Critic algorithm to fine-tune the low-levelpolicy in conjunction with high-level skill selection. Our Skill-Criticalgorithm optimizes both the low-level and high-level policies; these policiesare initialized and regularized by the latent space learned from offlinedemonstrations to guide the parallel policy optimization. We validateSkill-Critic in multiple sparse-reward RL environments, including a newsparse-reward autonomous racing task in Gran Turismo Sport. The experimentsshow that Skill-Critic's low-level policy fine-tuning and demonstration-guidedregularization are essential for good performance. Code and videos areavailable at our website: https://sites.google.com/view/skill-critic.

Quick Read (beta)

loading the full paper ...