Skill-Critic: Refining Learned Skills for Hierarchical Reinforcement Learning

  • 2024-07-12 02:59:00
  • Ce Hao, Catherine Weaver, Chen Tang, Kenta Kawamoto, Masayoshi Tomizuka, Wei Zhan
Hierarchical reinforcement learning (RL) can accelerate long-horizondecision-making by temporally abstracting a policy into multiple levels.Promising results in sparse reward environments have been seen with skills,i.e. sequences of primitive actions. Typically, a skill latent space and policyare discovered from offline data. However, the resulting low-level policy canbe unreliable due to low-coverage demonstrations or distribution shifts. As asolution, we propose the Skill-Critic algorithm to fine-tune the low-levelpolicy in conjunction with high-level skill selection. Our Skill-Criticalgorithm optimizes both the low-level and high-level policies; these policiesare initialized and regularized by the latent space learned from offlinedemonstrations to guide the parallel policy optimization. We validateSkill-Critic in multiple sparse-reward RL environments, including a newsparse-reward autonomous racing task in Gran Turismo Sport. The experimentsshow that Skill-Critic's low-level policy fine-tuning and demonstration-guidedregularization are essential for good performance. Code and videos areavailable at our website:


