DYSTIL: Dynamic Strategy Induction with Large Language Models for Reinforcement Learning

Abstract

Reinforcement learning from expert demonstrations has long remained achallenging research problem, and existing state-of-the-art methods usingbehavioral cloning plus further RL training often suffer from poorgeneralization, low sample efficiency, and poor model interpretability.Inspired by the strong reasoning abilities of large language models (LLMs), wepropose a novel strategy-based reinforcement learning framework integrated withLLMs called DYnamic STrategy Induction with Llms for reinforcement learning(DYSTIL) to overcome these limitations. DYSTIL dynamically queries astrategy-generating LLM to induce textual strategies based on advantageestimations and expert demonstrations, and gradually internalizes inducedstrategies into the RL agent through policy optimization to improve itsperformance through boosting policy generalization and enhancing sampleefficiency. It also provides a direct textual channel to observe and interpretthe evolution of the policy's underlying strategies during training. We testDYSTIL over challenging RL environments from Minigrid and BabyAI, andempirically demonstrate that DYSTIL significantly outperforms state-of-the-artbaseline methods by 17.75% in average success rate while also enjoying highersample efficiency during the learning process.

Quick Read (beta)

loading the full paper ...