A Clean Slate for Offline Reinforcement Learning

Abstract

Progress in offline reinforcement learning (RL) has been impeded by ambiguousproblem definitions and entangled algorithmic designs, resulting ininconsistent implementations, insufficient ablations, and unfair evaluations.Although offline RL explicitly avoids environment interaction, prior methodsfrequently employ extensive, undocumented online evaluation for hyperparametertuning, complicating method comparisons. Moreover, existing referenceimplementations differ significantly in boilerplate code, obscuring their corealgorithmic contributions. We address these challenges by first introducing arigorous taxonomy and a transparent evaluation protocol that explicitlyquantifies online tuning budgets. To resolve opaque algorithmic design, weprovide clean, minimalistic, single-file implementations of various model-freeand model-based offline RL methods, significantly enhancing clarity andachieving substantial speed-ups. Leveraging these streamlined implementations,we propose Unifloral, a unified algorithm that encapsulates diverse priorapproaches within a single, comprehensive hyperparameter space, enablingalgorithm development in a shared hyperparameter space. Using Unifloral withour rigorous evaluation protocol, we develop two novel algorithms - TD3-AWR(model-free) and MoBRAC (model-based) - which substantially outperformestablished baselines. Our implementation is publicly available athttps://github.com/EmptyJackson/unifloral.

Quick Read (beta)

loading the full paper ...