GTA: Generative Trajectory Augmentation with Guidance for Offline Reinforcement Learning

Abstract

Offline Reinforcement Learning (Offline RL) presents challenges of learningeffective decision-making policies from static datasets without any onlineinteractions. Data augmentation techniques, such as noise injection and datasynthesizing, aim to improve Q-function approximation by smoothing the learnedstate-action region. However, these methods often fall short of directlyimproving the quality of offline datasets, leading to suboptimal results. Inresponse, we introduce \textbf{GTA}, Generative Trajectory Augmentation, anovel generative data augmentation approach designed to enrich offline data byaugmenting trajectories to be both high-rewarding and dynamically plausible.GTA applies a diffusion model within the data augmentation framework. GTApartially noises original trajectories and then denoises them withclassifier-free guidance via conditioning on amplified return value. Ourresults show that GTA, as a general data augmentation strategy, enhances theperformance of widely used offline RL algorithms in both dense and sparsereward settings. Furthermore, we conduct a quality analysis of data augmentedby GTA and demonstrate that GTA improves the quality of the data. Our code isavailable at https://github.com/Jaewoopudding/GTA

Quick Read (beta)

loading the full paper ...