Abstract
Despite their success, existing meta reinforcement learning methods stillhave difficulty in learning a meta policy effectively for RL problems withsparse reward. To this end, we develop a novel meta reinforcement learningframework, Hyper-Meta RL (HMRL), for sparse reward RL problems. It consists ofmeta state embedding, meta reward shaping and meta policy learning modules: Thecross-environment meta state embedding module constructs a common meta statespace to adapt to different environments; The meta state basedenvironment-specific meta reward shaping effectively extends the originalsparse reward trajectory by cross-environmental knowledge complementarity; As aconsequence, the meta policy then achieves better generalization and efficiencywith the shaped meta reward. Experiments with sparse reward show thesuperiority of HMRL on both transferability and policy learning efficiency.