Abstract
Goal-conditioned hierarchical reinforcement learning (HRL) is a promisingapproach for scaling up reinforcement learning (RL) techniques. However, itoften suffers from training inefficiency as the action space of the high-level,i.e., the goal space, is often large. Searching in a large goal space posesdifficulties for both high-level subgoal generation and low-level policylearning. In this paper, we show that this problem can be effectivelyalleviated by restricting the high-level action space from the whole goal spaceto a $k$-step adjacency region centered by the current state using an adjacencyconstraint. We theoretically prove that the proposed adjacency constraintpreserves the optimal hierarchical policy, and show that this constraint can bepractically implemented by training an adjacency network that can discriminatebetween adjacent and non-adjacent subgoals. Experimental results on discreteand continuous control tasks show that our method outperforms thestate-of-the-art HRL approaches.