Curriculum goal masking for continuous deep reinforcement learning

Abstract

Deep reinforcement learning has recently gained a focus on problems wherepolicy or value functions are independent of goals. Evidence exists that thesampling of goals has a strong effect on the learning performance, but there isa lack of general mechanisms that focus on optimizing the goal samplingprocess. In this work, we present a simple and general goal masking method thatalso allows us to estimate a goal's difficulty level and thus realize acurriculum learning approach for deep RL. Our results indicate that focusing ongoals with a medium difficulty level is appropriate for deep deterministicpolicy gradient (DDPG) methods, while an "aim for the stars and reach themoon-strategy", where hard goals are sampled much more often than simple goals,leads to the best learning performance in cases where DDPG is combined with forhindsight experience replay (HER). We demonstrate that the approachsignificantly outperforms standard goal sampling for different robotic objectmanipulation problems.

Quick Read (beta)

loading the full paper ...