Hierarchical Imitation and Reinforcement Learning

Abstract

We study how to effectively leverage expert feedback to learn sequentialdecision-making policies. We focus on problems with sparse rewards and longtime horizons, which typically pose significant challenges in reinforcementlearning. We propose an algorithmic framework, called hierarchical guidance,that leverages the hierarchical structure of the underlying problem tointegrate different modes of expert interaction. Our framework can incorporatedifferent combinations of imitation learning (IL) and reinforcement learning(RL) at different levels, leading to dramatic reductions in both expert effortand cost of exploration. Using long-horizon benchmarks, including Montezuma'sRevenge, we demonstrate that our approach can learn significantly faster thanhierarchical RL, and be significantly more label-efficient than standard IL. Wealso theoretically analyze labeling cost for certain instantiations of ourframework.

Quick Read (beta)

loading the full paper ...