Abstract
Multi-goal reinforcement learning is widely applied in planning and robotmanipulation. Two main challenges in multi-goal reinforcement learning aresparse rewards and sample inefficiency. Hindsight Experience Replay (HER) aimsto tackle the two challenges via goal relabeling. However, HER-related worksstill need millions of samples and a huge computation. In this paper, wepropose Multi-step Hindsight Experience Replay (MHER), incorporating multi-steprelabeled returns based on $n$-step relabeling to improve sample efficiency.Despite the advantages of $n$-step relabeling, we theoretically andexperimentally prove the off-policy $n$-step bias introduced by $n$-steprelabeling may lead to poor performance in many environments. To address theabove issue, two bias-reduced MHER algorithms, MHER($\lambda$) and Model-basedMHER (MMHER) are presented. MHER($\lambda$) exploits the $\lambda$ return whileMMHER benefits from model-based value expansions. Experimental results onnumerous multi-goal robotic tasks show that our solutions can successfullyalleviate off-policy $n$-step bias and achieve significantly higher sampleefficiency than HER and Curriculum-guided HER with little additionalcomputation beyond HER.