Provably Convergent Policy Gradient Methods for Model-Agnostic Meta-Reinforcement Learning

Abstract

We consider Model-Agnostic Meta-Learning (MAML) methods for ReinforcementLearning (RL) problems where the goal is to find a policy (using data fromseveral tasks represented by Markov Decision Processes (MDPs)) that can beupdated by one step of stochastic policy gradient for the realized MDP. Inparticular, using stochastic gradients in MAML update step is crucial for RLproblems since computation of exact gradients requires access to a large numberof possible trajectories. For this formulation, we propose a variant of theMAML method, named Stochastic Gradient Meta-Reinforcement Learning (SG-MRL),and study its convergence properties. We derive the iteration and samplecomplexity of SG-MRL to find an $\epsilon$-first-order stationary point, which,to the best of our knowledge, provides the first convergence guarantee formodel-agnostic meta-reinforcement learning algorithms. We further show how ourresults extend to the case where more than one step of stochastic policygradient method is used in the update during the test time.

Quick Read (beta)

loading the full paper ...