Measuring Data Quality for Dataset Selection in Offline Reinforcement Learning

Abstract

Recently developed offline reinforcement learning algorithms have made itpossible to learn policies directly from pre-collected datasets, giving rise toa new dilemma for practitioners: Since the performance the algorithms are ableto deliver depends greatly on the dataset that is presented to them,practitioners need to pick the right dataset among the available ones. Thisproblem has so far not been discussed in the corresponding literature. Wediscuss ideas how to select promising datasets and propose three very simpleindicators: Estimated relative return improvement (ERI) and estimated actionstochasticity (EAS), as well as a combination of the two (COI), and empiricallyshow that despite their simplicity they can be very effectively used fordataset selection.

Quick Read (beta)

loading the full paper ...