ProMP: Proximal Meta-Policy Search

Abstract

Credit assignment in Meta-reinforcement learning (Meta-RL) is still poorlyunderstood. Existing methods either neglect credit assignment to pre-adaptationbehavior or implement it naively. This leads to poor sample-efficiency duringmeta-training as well as ineffective task identification strategies. This paperprovides a theoretical analysis of credit assignment in gradient-based Meta-RL.Building on the gained insights we develop a novel meta-learning algorithm thatovercomes both the issue of poor credit assignment and previous difficulties inestimating meta-policy gradients. By controlling the statistical distance ofboth pre-adaptation and adapted policies during meta-policy search, theproposed algorithm endows efficient and stable meta-learning. Our approachleads to superior pre-adaptation policy behavior and consistently outperformsprevious Meta-RL algorithms in sample-efficiency, wall-clock time, andasymptotic performance. Our code is available athttps://github.com/jonasrothfuss/promp.

Quick Read (beta)

loading the full paper ...