Imagine Beyond! Distributionally Robust Auto-Encoding for State Space Coverage in Online Reinforcement Learning

Abstract

Goal-Conditioned Reinforcement Learning (GCRL) enables agents to autonomouslyacquire diverse behaviors, but faces major challenges in visual environmentsdue to high-dimensional, semantically sparse observations. In the onlinesetting, where agents learn representations while exploring, the latent spaceevolves with the agent's policy, to capture newly discovered areas of theenvironment. However, without incentivization to maximize state coverage in therepresentation, classical approaches based on auto-encoders may converge tolatent spaces that over-represent a restricted set of states frequently visitedby the agent. This is exacerbated in an intrinsic motivation setting, where theagent uses the distribution encoded in the latent space to sample the goals itlearns to master. To address this issue, we propose to progressively enforcedistributional shifts towards a uniform distribution over the full state space,to ensure a full coverage of skills that can be learned in the environment. Weintroduce DRAG (Distributionally Robust Auto-Encoding for GCRL), a method thatcombines the $\beta$-VAE framework with Distributionally Robust Optimization.DRAG leverages an adversarial neural weighter of training states of the VAE, toaccount for the mismatch between the current data distribution and unseen partsof the environment. This allows the agent to construct semantically meaningfullatent spaces beyond its immediate experience. Our approach improves statespace coverage and downstream control performance on hard explorationenvironments such as mazes and robotic control involving walls to bypass,without pre-training nor prior environment knowledge.

Quick Read (beta)

loading the full paper ...