MENTOR: Guiding Hierarchical Reinforcement Learning with Human Feedback and Dynamic Distance Constraint

Abstract

Hierarchical reinforcement learning (HRL) provides a promising solution forcomplex tasks with sparse rewards of intelligent agents, which uses ahierarchical framework that divides tasks into subgoals and completes themsequentially. However, current methods struggle to find suitable subgoals forensuring a stable learning process. Without additional guidance, it isimpractical to rely solely on exploration or heuristics methods to determinesubgoals in a large goal space. To address the issue, We propose a generalhierarchical reinforcement learning framework incorporating human feedback anddynamic distance constraints (MENTOR). MENTOR acts as a "mentor", incorporatinghuman feedback into high-level policy learning, to find better subgoals. As forlow-level policy, MENTOR designs a dual policy for exploration-exploitationdecoupling respectively to stabilize the training. Furthermore, although humanscan simply break down tasks into subgoals to guide the right learningdirection, subgoals that are too difficult or too easy can still hinderdownstream learning efficiency. We propose the Dynamic Distance Constraint(DDC) mechanism dynamically adjusting the space of optional subgoals. ThusMENTOR can generate subgoals matching the low-level policy learning processfrom easy to hard. Extensive experiments demonstrate that MENTOR uses a smallamount of human feedback to achieve significant improvement in complex taskswith sparse rewards.

Quick Read (beta)

loading the full paper ...