Behavior Regularized Offline Reinforcement Learning

Abstract

In reinforcement learning (RL) research, it is common to assume access todirect online interactions with the environment. However in many real-worldapplications, access to the environment is limited to a fixed offline datasetof logged experience. In such settings, standard RL algorithms have been shownto diverge or otherwise yield poor performance. Accordingly, recent work hassuggested a number of remedies to these issues. In this work, we introduce ageneral framework, behavior regularized actor critic (BRAC), to empiricallyevaluate recently proposed methods as well as a number of simple baselinesacross a variety of offline continuous control tasks. Surprisingly, we findthat many of the technical complexities introduced in recent methods areunnecessary to achieve strong performance. Additional ablations provideinsights into which design choices matter most in the offline RL setting.

Quick Read (beta)

loading the full paper ...