Curriculum in Gradient-Based Meta-Reinforcement Learning

Abstract

Gradient-based meta-learners such as Model-Agnostic Meta-Learning (MAML) haveshown strong few-shot performance in supervised and reinforcement learningsettings. However, specifically in the case of meta-reinforcement learning(meta-RL), we can show that gradient-based meta-learners are sensitive to taskdistributions. With the wrong curriculum, agents suffer the effects ofmeta-overfitting, shallow adaptation, and adaptation instability. In this work,we begin by highlighting intriguing failure cases of gradient-based meta-RL andshow that task distributions can wildly affect algorithmic outputs, stability,and performance. To address this problem, we leverage insights from recentliterature on domain randomization and propose meta Active Domain Randomization(meta-ADR), which learns a curriculum of tasks for gradient-based meta-RL in asimilar as ADR does for sim2real transfer. We show that this approach inducesmore stable policies on a variety of simulated locomotion and navigation tasks.We assess in- and out-of-distribution generalization and find that the learnedtask distributions, even in an unstructured task space, greatly improve theadaptation performance of MAML. Finally, we motivate the need for betterbenchmarking in meta-RL that prioritizes \textit{generalization} oversingle-task adaption performance.

Quick Read (beta)

loading the full paper ...