Deep autoregressive density nets vs neural ensembles for model-based offline reinforcement learning

Abstract

We consider the problem of offline reinforcement learning where only a set ofsystem transitions is made available for policy optimization. Following recentadvances in the field, we consider a model-based reinforcement learningalgorithm that infers the system dynamics from the available data and performspolicy optimization on imaginary model rollouts. This approach is vulnerable toexploiting model errors which can lead to catastrophic failures on the realsystem. The standard solution is to rely on ensembles for uncertaintyheuristics and to avoid exploiting the model where it is too uncertain. Wechallenge the popular belief that we must resort to ensembles by showing thatbetter performance can be obtained with a single well-calibrated autoregressivemodel on the D4RL benchmark. We also analyze static metrics of model-learningand conclude on the important model properties for the final performance of theagent.

Quick Read (beta)

loading the full paper ...