Abstract
Hierarchical reinforcement learning (HRL) is a promising approach that usestemporal abstraction to solve complex long horizon problems. However,simultaneously learning a hierarchy of policies is unstable as it ischallenging to train higher-level policy when the lower-level primitive isnon-stationary. In this paper, we present CRISP, a novel HRL algorithm thateffectively generates a curriculum of achievable subgoals for evolvinglower-level primitives using reinforcement learning and imitation learning.CRISP uses the lower level primitive to periodically perform data relabeling ona handful of expert demonstrations, using a novel primitive informed parsing(PIP) approach, thereby mitigating non-stationarity. Since our approach onlyassumes access to a handful of expert demonstrations, it is suitable for mostrobotic control tasks. Experimental evaluations on complex robotic mazenavigation and robotic manipulation tasks demonstrate that inducinghierarchical curriculum learning significantly improves sample efficiency, andresults in efficient goal conditioned policies for solving temporally extendedtasks. Additionally, we perform real world robotic experiments on complexmanipulation tasks and demonstrate that CRISP demonstrates impressivegeneralization in real world scenarios.