On the Convergence Theory of Debiased Model-Agnostic Meta-Reinforcement Learning

Abstract

We consider Model-Agnostic Meta-Learning (MAML) methods for ReinforcementLearning (RL) problems, where the goal is to find a policy using data fromseveral tasks represented by Markov Decision Processes (MDPs) that can beupdated by one step of stochastic policy gradient for the realized MDP. Inparticular, using stochastic gradients in MAML update steps is crucial for RLproblems since computation of exact gradients requires access to a large numberof possible trajectories. For this formulation, we propose a variant of theMAML method, named Stochastic Gradient Meta-Reinforcement Learning (SG-MRL),and study its convergence properties. We derive the iteration and samplecomplexity of SG-MRL to find an $\epsilon$-first-order stationary point, which,to the best of our knowledge, provides the first convergence guarantee formodel-agnostic meta-reinforcement learning algorithms. We further show how ourresults extend to the case where more than one step of stochastic policygradient method is used at test time. Finally, we empirically compare SG-MRLand MAML in several deep RL environments.

Quick Read (beta)

loading the full paper ...