Off-Policy Meta-Reinforcement Learning Based on Feature Embedding Spaces

Abstract

Meta-reinforcement learning (RL) addresses the problem of sample inefficiencyin deep RL by using experience obtained in past tasks for a new task to besolved. However, most meta-RL methods require partially or fully on-policy data,i.e., they cannot reuse the data collected by past policies, which hinders theimprovement of sample efficiency. To alleviate this problem, we propose a novel off-policy meta-RL method,embedding learning and evaluation of uncertainty (ELUE). An ELUE agent is characterized by the learning of a feature embedding spaceshared among tasks. It learns a belief model over the embedding space and a belief-conditionalpolicy and Q-function. Then, for a new task, it collects data by the pretrained policy, and updatesits belief based on the belief model. Thanks to the belief update, the performance can be improved with a smallamount of data. In addition, it updates the parameters of the neural networks to adjust thepretrained relationships when there are enough data. We demonstrate that ELUE outperforms state-of-the-art meta RL methods throughexperiments on meta-RL benchmarks.

Quick Read (beta)

loading the full paper ...