Bias-reduced Multi-step Hindsight Experience Replay for Efficient Multi-goal Reinforcement Learning

  • 2022-09-26 16:42:17
  • Rui Yang, Jiafei Lyu, Yu Yang, Jiangpeng Yan, Feng Luo, Dijun Luo, Lanqing Li, Xiu Li
  • 0

Abstract

Multi-goal reinforcement learning is widely applied in planning and robotmanipulation. Two main challenges in multi-goal reinforcement learning aresparse rewards and sample inefficiency. Hindsight Experience Replay (HER) aimsto tackle the two challenges via goal relabeling. However, HER-related worksstill need millions of samples and a huge computation. In this paper, wepropose Multi-step Hindsight Experience Replay (MHER), incorporating multi-steprelabeled returns based on $n$-step relabeling to improve sample efficiency.Despite the advantages of $n$-step relabeling, we theoretically andexperimentally prove the off-policy $n$-step bias introduced by $n$-steprelabeling may lead to poor performance in many environments. To address theabove issue, two bias-reduced MHER algorithms, MHER($\lambda$) and Model-basedMHER (MMHER) are presented. MHER($\lambda$) exploits the $\lambda$ return whileMMHER benefits from model-based value expansions. Experimental results onnumerous multi-goal robotic tasks show that our solutions can successfullyalleviate off-policy $n$-step bias and achieve significantly higher sampleefficiency than HER and Curriculum-guided HER with little additionalcomputation beyond HER.

 

Quick Read (beta)

loading the full paper ...