Abstract
A major challenge of reinforcement learning (RL) in real-world applicationsis the variation between environments, tasks or clients. Meta-RL (MRL)addresses this issue by learning a meta-policy that adapts to new tasks.Standard MRL methods optimize the average return over tasks, but often sufferfrom poor results in tasks of high risk or difficulty. This limits systemreliability whenever test tasks are not known in advance. In this work, wepropose a robust MRL objective with a controlled robustness level. Optimizationof analogous robust objectives in RL often leads to both biased gradients anddata inefficiency. We prove that the former disappears in MRL, and address thelatter via the novel Robust Meta RL algorithm (RoML). RoML is a meta-algorithmthat generates a robust version of any given MRL algorithm, by identifying andover-sampling harder tasks throughout training. We demonstrate that RoML learnssubstantially different meta-policies and achieves robust returns on severalnavigation and continuous control benchmarks.