Improving Zero-shot Generalization in Offline Reinforcement Learning using Generalized Similarity Functions

Abstract

Reinforcement learning (RL) agents are widely used for solving complexsequential decision making tasks, but still exhibit difficulty in generalizingto scenarios not seen during training. While prior online approachesdemonstrated that using additional signals beyond the reward function can leadto better generalization capabilities in RL agents, i.e. using self-supervisedlearning (SSL), they struggle in the offline RL setting, i.e. learning from astatic dataset. We show that performance of online algorithms forgeneralization in RL can be hindered in the offline setting due to poorestimation of similarity between observations. We propose a newtheoretically-motivated framework called Generalized Similarity Functions(GSF), which uses contrastive learning to train an offline RL agent toaggregate observations based on the similarity of their expected futurebehavior, where we quantify this similarity using \emph{generalized valuefunctions}. We show that GSF is general enough to recover existing SSLobjectives while also improving zero-shot generalization performance on acomplex offline RL benchmark, offline Procgen.

Quick Read (beta)

loading the full paper ...