Optimal Goal-Reaching Reinforcement Learning via Quasimetric Learning

Abstract

In goal-reaching reinforcement learning (RL), the optimal value function hasa particular geometry, called quasimetric structure. This paper introducesQuasimetric Reinforcement Learning (QRL), a new RL method that utilizesquasimetric models to learn optimal value functions. Distinct from priorapproaches, the QRL objective is specifically designed for quasimetrics, andprovides strong theoretical recovery guarantees. Empirically, we conductthorough analyses on a discretized MountainCar environment, identifyingproperties of QRL and its advantages over alternatives. On offline and onlinegoal-reaching benchmarks, QRL also demonstrates improved sample efficiency andperformance, across both state-based and image-based observations.

Quick Read (beta)

loading the full paper ...