On First-Order Meta-Reinforcement Learning with Moreau Envelopes

Abstract

Meta-Reinforcement Learning (MRL) is a promising framework for trainingagents that can quickly adapt to new environments and tasks. In this work, westudy the MRL problem under the policy gradient formulation, where we propose anovel algorithm that uses Moreau envelope surrogate regularizers to jointlylearn a meta-policy that is adjustable to the environment of each individualtask. Our algorithm, called Moreau Envelope Meta-Reinforcement Learning(MEMRL), learns a meta-policy that can adapt to a distribution of tasks byefficiently updating the policy parameters using a combination ofgradient-based optimization and Moreau Envelope regularization. MoreauEnvelopes provide a smooth approximation of the policy optimization problem,which enables us to apply standard optimization techniques and converge to anappropriate stationary point. We provide a detailed analysis of the MEMRLalgorithm, where we show a sublinear convergence rate to a first-orderstationary point for non-convex policy gradient optimization. We finally showthe effectiveness of MEMRL on a multi-task 2D-navigation problem.

Quick Read (beta)

loading the full paper ...