Hierarchical Reinforcement Learning with Hindsight

Abstract

Reinforcement Learning (RL) algorithms can suffer from poor sample efficiencywhen rewards are delayed and sparse. We introduce a solution that enablesagents to learn temporally extended actions at multiple levels of abstractionin a sample efficient and automated fashion. Our approach combines universalvalue functions and hindsight learning, allowing agents to learn policiesbelonging to different time scales in parallel. We show that our methodsignificantly accelerates learning in a variety of discrete and continuoustasks.

Quick Read (beta)

loading the full paper ...