Abstract
Meta-reinforcement learning (meta-RL) aims to learn from multiple trainingtasks the ability to adapt efficiently to unseen test tasks. Despite thesuccess, existing meta-RL algorithms are known to be sensitive to the taskdistribution shift. When the test task distribution is different from thetraining task distribution, the performance may degrade significantly. Toaddress this issue, this paper proposes Model-based AdversarialMeta-Reinforcement Learning (AdMRL), where we aim to minimize the worst-casesub-optimality gap -- the difference between the optimal return and the returnthat the algorithm achieves after adaptation -- across all tasks in a family oftasks, with a model-based approach. We propose a minimax objective and optimizeit by alternating between learning the dynamics model on a fixed task andfinding the adversarial task for the current model -- the task for which thepolicy induced by the model is maximally suboptimal. Assuming the family oftasks is parameterized, we derive a formula for the gradient of thesuboptimality with respect to the task parameters via the implicit functiontheorem, and show how the gradient estimator can be efficiently implemented bythe conjugate gradient method and a novel use of the REINFORCE estimator. Weevaluate our approach on several continuous control benchmarks and demonstrateits efficacy in the worst-case performance over all tasks, the generalizationpower to out-of-distribution tasks, and in training and test time sampleefficiency, over existing state-of-the-art meta-RL algorithms.