Variational OOD State Correction for Offline Reinforcement Learning

Abstract

The performance of Offline reinforcement learning is significantly impactedby the issue of state distributional shift, and out-of-distribution (OOD) statecorrection is a popular approach to address this problem. In this paper, wepropose a novel method named Density-Aware Safety Perception (DASP) for OODstate correction. Specifically, our method encourages the agent to prioritizeactions that lead to outcomes with higher data density, thereby promoting itsoperation within or the return to in-distribution (safe) regions. To achievethis, we optimize the objective within a variational framework thatconcurrently considers both the potential outcomes of decision-making and theirdensity, thus providing crucial contextual information for safedecision-making. Finally, we validate the effectiveness and feasibility of ourproposed method through extensive experimental evaluations on the offlineMuJoCo and AntMaze suites.

Quick Read (beta)

loading the full paper ...