Meta-Reinforcement Learning with Universal Policy Adaptation: Provable Near-Optimality under All-task Optimum Comparator

Abstract

Meta-reinforcement learning (Meta-RL) has attracted attention due to itscapability to enhance reinforcement learning (RL) algorithms, in terms of dataefficiency and generalizability. In this paper, we develop a bileveloptimization framework for meta-RL (BO-MRL) to learn the meta-prior fortask-specific policy adaptation, which implements multiple-step policyoptimization on one-time data collection. Beyond existing meta-RL analyses, weprovide upper bounds of the expected optimality gap over the task distribution.This metric measures the distance of the policy adaptation from the learnedmeta-prior to the task-specific optimum, and quantifies the model'sgeneralizability to the task distribution. We empirically validate thecorrectness of the derived upper bounds and demonstrate the superioreffectiveness of the proposed algorithm over benchmarks.

Quick Read (beta)

loading the full paper ...