Abstract
Goal-conditioned reinforcement learning (GCRL) has a wide range of potentialreal-world applications, including manipulation and navigation problems inrobotics. Especially in such robotics tasks, sample efficiency is of the utmostimportance for GCRL since, by default, the agent is only rewarded when itreaches its goal. While several methods have been proposed to improve thesample efficiency of GCRL, one relatively under-studied approach is the designof neural architectures to support sample efficiency. In this work, weintroduce a novel neural architecture for GCRL that achieves significantlybetter sample efficiency than the commonly-used monolithic networkarchitecture. The key insight is that the optimal action-value function Q^*(s,a, g) must satisfy the triangle inequality in a specific sense. Furthermore, weintroduce the metric residual network (MRN) that deliberately decomposes theaction-value function Q(s,a,g) into the negated summation of a metric plus aresidual asymmetric component. MRN provably approximates any optimalaction-value function Q^*(s,a,g), thus making it a fitting neural architecturefor GCRL. We conduct comprehensive experiments across 12 standard benchmarkenvironments in GCRL. The empirical results demonstrate that MRN uniformlyoutperforms other state-of-the-art GCRL neural architectures in terms of sampleefficiency.