Hierarchical Reinforcement Learning with Advantage-Based Auxiliary Rewards

  • 2019-10-10 09:39:15
  • Siyuan Li, Rui Wang, Minxue Tang, Chongjie Zhang
  • 14

Abstract

Hierarchical Reinforcement Learning (HRL) is a promising approach to solvinglong-horizon problems with sparse and delayed rewards. Many existing HRLalgorithms either use pre-trained low-level skills that are unadaptable, orrequire domain-specific information to define low-level rewards. In this paper,we aim to adapt low-level skills to downstream tasks while maintaining thegenerality of reward design. We propose an HRL framework which sets auxiliaryrewards for low-level skill training based on the advantage function of thehigh-level policy. This auxiliary reward enables efficient, simultaneouslearning of the high-level policy and low-level skills without usingtask-specific knowledge. In addition, we also theoretically prove thatoptimizing low-level skills with this auxiliary reward will increase the taskreturn for the joint policy. Experimental results show that our algorithmdramatically outperforms other state-of-the-art HRL methods in Mujoco domains.We also find both low-level and high-level policies trained by our algorithmtransferable.

 

Quick Read (beta)

loading the full paper ...