Goal-Conditioned Data Augmentation for Offline Reinforcement Learning

Abstract

Offline reinforcement learning (RL) enables policy learning frompre-collected offline datasets, relaxing the need to interact directly with theenvironment. However, limited by the quality of offline datasets, it generallyfails to learn well-qualified policies in suboptimal datasets. To addressdatasets with insufficient optimal demonstrations, we introduceGoal-cOnditioned Data Augmentation (GODA), a novel goal-conditioneddiffusion-based method for augmenting samples with higher quality. Leveragingrecent advancements in generative modeling, GODA incorporates a novelreturn-oriented goal condition with various selection mechanisms. Specifically,we introduce a controllable scaling technique to provide enhanced return-basedguidance during data sampling. GODA learns a comprehensive distributionrepresentation of the original offline datasets while generating new data withselectively higher-return goals, thereby maximizing the utility of limitedoptimal demonstrations. Furthermore, we propose a novel adaptive gatedconditioning method for processing noised inputs and conditions, enhancing thecapture of goal-oriented guidance. We conduct experiments on the D4RL benchmarkand real-world challenges, specifically traffic signal control (TSC) tasks, todemonstrate GODA's effectiveness in enhancing data quality and superiorperformance compared to state-of-the-art data augmentation methods acrossvarious offline RL algorithms.

Quick Read (beta)

loading the full paper ...