Natural Language Reinforcement Learning

Abstract

Reinforcement Learning (RL) mathematically formulates decision-making withMarkov Decision Process (MDP). With MDPs, researchers have achieved remarkablebreakthroughs across various domains, including games, robotics, and languagemodels. This paper seeks a new possibility, Natural Language ReinforcementLearning (NLRL), by extending traditional MDP to natural language-basedrepresentation space. Specifically, NLRL innovatively redefines RL principles,including task objectives, policy, value function, Bellman equation, and policyiteration, into their language counterparts. With recent advancements in largelanguage models (LLMs), NLRL can be practically implemented to achieve RL-likepolicy and value improvement by either pure prompting or gradient-basedtraining. Experiments over Maze, Breakthrough, and Tic-Tac-Toe gamesdemonstrate the effectiveness, efficiency, and interpretability of the NLRLframework among diverse use cases. Our code will be released athttps://github.com/waterhorse1/Natural-language-RL.

Quick Read (beta)

loading the full paper ...