FOCAL: Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization

Abstract

We study the offline meta-reinforcement learning (OMRL) problem, a paradigmwhich enables reinforcement learning (RL) algorithms to quickly adapt to unseentasks without any interactions with the environments, making RL truly practicalin many real-world applications. This problem is still not fully understood,for which two major challenges need to be addressed. First, offline RL usuallysuffers from bootstrapping errors of out-of-distribution state-actions whichleads to divergence of value functions. Second, meta-RL requires efficient androbust task inference learned jointly with control policy. In this work, weenforce behavior regularization on learned policy as a general approach tooffline RL, combined with a deterministic context encoder for efficient taskinference. We propose a novel negative-power distance metric on bounded contextembedding space, whose gradients propagation is detached from the Bellmanbackup. We provide analysis and insight showing that some simple design choicescan yield substantial improvements over recent approaches involving meta-RL anddistance metric learning. To the best of our knowledge, our method is the firstmodel-free and end-to-end OMRL algorithm, which is computationally efficientand demonstrated to outperform prior algorithms on several meta-RL benchmarks.

Quick Read (beta)

loading the full paper ...