Improving and Benchmarking Offline Reinforcement Learning Algorithms

Abstract

Recently, Offline Reinforcement Learning (RL) has achieved remarkableprogress with the emergence of various algorithms and datasets. However, thesemethods usually focus on algorithmic advancements, ignoring that many low-levelimplementation choices considerably influence or even drive the finalperformance. As a result, it becomes hard to attribute the progress in OfflineRL as these choices are not sufficiently discussed and aligned in theliterature. In addition, papers focusing on a dataset (e.g., D4RL) often ignorealgorithms proposed on another dataset (e.g., RL Unplugged), causing isolationamong the algorithms, which might slow down the overall progress. Therefore,this work aims to bridge the gaps caused by low-level choices and datasets. Tothis end, we empirically investigate 20 implementation choices using threerepresentative algorithms (i.e., CQL, CRR, and IQL) and present a guidebook forchoosing implementations. Following the guidebook, we find two variants CRR+and CQL+ , achieving new state-of-the-art on D4RL. Moreover, we benchmark eightpopular offline RL algorithms across datasets under unified training andevaluation framework. The findings are inspiring: the success of a learningparadigm severely depends on the data distribution, and some previousconclusions are biased by the dataset used. Our code is available athttps://github.com/sail-sg/offbench.

Quick Read (beta)

loading the full paper ...