Pretraining a Shared Q-Network for Data-Efficient Offline Reinforcement Learning

Abstract

Offline reinforcement learning (RL) aims to learn a policy from a staticdataset without further interactions with the environment. Collectingsufficiently large datasets for offline RL is exhausting since this datacollection requires colossus interactions with environments and becomes trickywhen the interaction with the environment is restricted. Hence, how an agentlearns the best policy with a minimal static dataset is a crucial issue inoffline RL, similar to the sample efficiency problem in online RL. In thispaper, we propose a simple yet effective plug-and-play pretraining method toinitialize a feature of a $Q$-network to enhance data efficiency in offline RL.Specifically, we introduce a shared $Q$-network structure that outputspredictions of the next state and $Q$-value. We pretrain the shared $Q$-networkthrough a supervised regression task that predicts a next state and trains theshared $Q$-network using diverse offline RL methods. Through extensiveexperiments, we empirically demonstrate that our method enhances theperformance of existing popular offline RL methods on the D4RL, Robomimic andV-D4RL benchmarks. Furthermore, we show that our method significantly boostsdata-efficient offline RL across various data qualities and data distributionstrough D4RL and ExoRL benchmarks. Notably, our method adapted with only 10% ofthe dataset outperforms standard algorithms even with full datasets.

Quick Read (beta)

loading the full paper ...