Simple Ingredients for Offline Reinforcement Learning

Abstract

Offline reinforcement learning algorithms have proven effective on datasetshighly connected to the target downstream task. Yet, leveraging a novel testbed(MOOD) in which trajectories come from heterogeneous sources, we show thatexisting methods struggle with diverse data: their performance considerablydeteriorates as data collected for related but different tasks is simply addedto the offline buffer. In light of this finding, we conduct a large empiricalstudy where we formulate and test several hypotheses to explain this failure.Surprisingly, we find that scale, more than algorithmic considerations, is thekey factor influencing performance. We show that simple methods like AWAC andIQL with increased network size overcome the paradoxical failure modes from theinclusion of additional data in MOOD, and notably outperform priorstate-of-the-art algorithms on the canonical D4RL benchmark.

Quick Read (beta)

loading the full paper ...