Hyper-Meta Reinforcement Learning with Sparse Reward

Abstract

Despite their success, existing meta reinforcement learning methods stillhave difficulty in learning a meta policy effectively for RL problems withsparse reward. To this end, we develop a novel meta reinforcement learningframework, Hyper-Meta RL (HMRL), for sparse reward RL problems. It consists ofmeta state embedding, meta reward shaping and meta policy learning modules: Thecross-environment meta state embedding module constructs a common meta statespace to adapt to different environments; The meta state basedenvironment-specific meta reward shaping effectively extends the originalsparse reward trajectory by cross-environmental knowledge complementarity; As aconsequence, the meta policy then achieves better generalization and efficiencywith the shaped meta reward. Experiments with sparse reward show thesuperiority of HMRL on both transferability and policy learning efficiency.

Quick Read (beta)

loading the full paper ...