Abstract
Sample efficiency remains a major obstacle for real world adoption ofreinforcement learning (RL): success has been limited to settings wheresimulators provide access to essentially unlimited environment interactions,which in reality are typically costly or dangerous to obtain. Offline RL inprinciple offers a solution by exploiting offline data to learn a near-optimalpolicy before deployment. In practice, however, current offline RL methods relyon extensive online interactions for hyperparameter tuning, and have noreliable bound on their initial online performance. To address these twoissues, we introduce two algorithms. Firstly, SOReL: an algorithm for safeoffline reinforcement learning. Using only offline data, our Bayesian approachinfers a posterior over environment dynamics to obtain a reliable estimate ofthe online performance via the posterior predictive uncertainty. Crucially, allhyperparameters are also tuned fully offline. Secondly, we introduce TOReL: atuning for offline reinforcement learning algorithm that extends ourinformation rate based offline hyperparameter tuning methods to general offlineRL approaches. Our empirical evaluation confirms SOReL's ability to accuratelyestimate regret in the Bayesian setting whilst TOReL's offline hyperparametertuning achieves competitive performance with the best online hyperparametertuning methods using only offline data. Thus, SOReL and TOReL make asignificant step towards safe and reliable offline RL, unlocking the potentialfor RL in the real world. Our implementations are publicly available:https://github.com/CWibault/sorel\_torel.