Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning

Abstract

Reinforcement learning methods trained on few environments rarely learnpolicies that generalize to unseen environments. To improve generalization, weincorporate the inherent sequential structure in reinforcement learning intothe representation learning process. This approach is orthogonal to recentapproaches, which rarely exploit this structure explicitly. Specifically, weintroduce a theoretically motivated policy similarity metric (PSM) formeasuring behavioral similarity between states. PSM assigns high similarity tostates for which the optimal policies in those states as well as in futurestates are similar. We also present a contrastive representation learningprocedure to embed any state similarity metric, which we instantiate with PSMto obtain policy similarity embeddings (PSEs). We demonstrate that PSEs improvegeneralization on diverse benchmarks, including LQR with spurious correlations,a jumping task from pixels, and Distracting DM Control Suite.

Quick Read (beta)

loading the full paper ...