Hundreds Guide Millions: Adaptive Offline Reinforcement Learning with Expert Guidance

Abstract

Offline reinforcement learning (RL) optimizes the policy on a previouslycollected dataset without any interactions with the environment, yet usuallysuffers from the distributional shift problem. To mitigate this issue, atypical solution is to impose a policy constraint on a policy improvementobjective. However, existing methods generally adopt a ``one-size-fits-all''practice, i.e., keeping only a single improvement-constraint balance for allthe samples in a mini-batch or even the entire offline dataset. In this work,we argue that different samples should be treated with different policyconstraint intensities. Based on this idea, a novel plug-in approach namedGuided Offline RL (GORL) is proposed. GORL employs a guiding network, alongwith only a few expert demonstrations, to adaptively determine the relativeimportance of the policy improvement and policy constraint for every sample. Wetheoretically prove that the guidance provided by our method is rational andnear-optimal. Extensive experiments on various environments suggest that GORLcan be easily installed on most offline RL algorithms with statisticallysignificant performance improvements.

Quick Read (beta)

loading the full paper ...