Contrastive Learning as Goal-Conditioned Reinforcement Learning

Abstract

In reinforcement learning (RL), it is easier to solve a task if given a goodrepresentation. While deep RL should automatically acquire such goodrepresentations, prior work often finds that learning representations in anend-to-end fashion is unstable and instead equip RL algorithms with additionalrepresentation learning parts (e.g., auxiliary losses, data augmentation). Howcan we design RL algorithms that directly acquire good representations? In thispaper, instead of adding representation learning parts to an existing RLalgorithm, we show (contrastive) representation learning methods can be cast asRL algorithms in their own right. To do this, we build upon prior work andapply contrastive representation learning to action-labeled trajectories, insuch a way that the (inner product of) learned representations exactlycorresponds to a goal-conditioned value function. We use this idea toreinterpret a prior RL method as performing contrastive learning, and then usethe idea to propose a much simpler method that achieves similar performance.Across a range of goal-conditioned RL tasks, we demonstrate that contrastive RLmethods achieve higher success rates than prior non-contrastive methods,including in the offline RL setting. We also show that contrastive RLoutperforms prior methods on image-based tasks, without using data augmentationor auxiliary objectives.

Quick Read (beta)

loading the full paper ...