GELATO: Geometrically Enriched Latent Model for Offline Reinforcement Learning

Abstract

Offline reinforcement learning approaches can generally be divided toproximal and uncertainty-aware methods. In this work, we demonstrate thebenefit of combining the two in a latent variational model. We impose a latentrepresentation of states and actions and leverage its intrinsic Riemanniangeometry to measure distance of latent samples to the data. Our proposedmetrics measure both the quality of out of distribution samples as well as thediscrepancy of examples in the data. We integrate our metrics in a model-basedoffline optimization framework, in which proximity and uncertainty can becarefully controlled. We illustrate the geodesics on a simple grid-likeenvironment, depicting its natural inherent topology. Finally, we analyze ourapproach and improve upon contemporary offline RL benchmarks.

Quick Read (beta)

loading the full paper ...