Can Wikipedia Help Offline Reinforcement Learning?

  • 2022-07-24 04:13:16
  • Machel Reid, Yutaro Yamada, Shixiang Shane Gu
  • 0


Fine-tuning reinforcement learning (RL) models has been challenging becauseof a lack of large scale off-the-shelf datasets as well as high variance intransferability among different environments. Recent work has looked attackling offline RL from the perspective of sequence modeling with improvedresults as result of the introduction of the Transformer architecture. However,when the model is trained from scratch, it suffers from slow convergencespeeds. In this paper, we look to take advantage of this formulation ofreinforcement learning as sequence modeling and investigate the transferabilityof pre-trained sequence models on other domains (vision, language) whenfinetuned on offline RL tasks (control, games). To this end, we also proposetechniques to improve transfer between these domains. Results show consistentperformance gains in terms of both convergence speed and reward on a variety ofenvironments, accelerating training by 3-6x and achieving state-of-the-artperformance in a variety of tasks using Wikipedia-pretrained and GPT2 languagemodels. We hope that this work not only brings light to the potentials ofleveraging generic sequence modeling techniques and pre-trained models for RL,but also inspires future work on sharing knowledge between generative modelingtasks of completely different domains.


Quick Read (beta)

loading the full paper ...