Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning

Abstract

Offline reinforcement learning (RL) aims to find a near-optimal policy usingpre-collected datasets. In real-world scenarios, data collection could becostly and risky; therefore, offline RL becomes particularly challenging whenthe in-domain data is limited. Given recent advances in Large Language Models(LLMs) and their few-shot learning prowess, this paper introduces$\textbf{La}$nguage Models for $\textbf{Mo}$tion Control ($\textbf{LaMo}$), ageneral framework based on Decision Transformers to effectively use pre-trainedLanguage Models (LMs) for offline RL. Our framework highlights four crucialcomponents: (1) Initializing Decision Transformers with sequentiallypre-trained LMs, (2) employing the LoRA fine-tuning method, in contrast tofull-weight fine-tuning, to combine the pre-trained knowledge from LMs andin-domain knowledge effectively, (3) using the non-linear MLP transformationinstead of linear projections, to generate embeddings, and (4) integrating anauxiliary language prediction loss during fine-tuning to stabilize the LMs andretain their original abilities on languages. Empirical results indicate$\textbf{LaMo}$ achieves state-of-the-art performance in sparse-reward tasksand closes the gap between value-based offline RL methods and decisiontransformers in dense-reward tasks. In particular, our method demonstratessuperior performance in scenarios with limited data samples. Our projectwebsite is $\href{https://lamo2023.github.io}{\text{this https URL}}$.

Quick Read (beta)

loading the full paper ...