Abstract
Traditional reinforcement learning and planning typically requires vastamounts of data and training to develop effective policies. In contrast, largelanguage models (LLMs) exhibit strong generalization and zero-shotcapabilities, but struggle with tasks that require detailed planning anddecision-making in complex action spaces. We introduce STRATEGIST, a novelapproach that integrates the strengths of both methods. Our approach leveragesLLMs to search and update high-level strategies (as text), which are thenrefined and executed by low-level Monte Carlo Tree Search (MCTS). STRATEGIST isa generalizable framework to optimize the strategy through population-basedself-play simulations without the need for any training data. We demonstratethe effectiveness of STRATEGIST in learning optimal strategies for competitive,multi-turn games with partial information, including Game of Pure Strategy(GOPS) and multi-agent, hidden-identity discussion games like The Resistance:Avalon. Our results show that agents equipped with STRATEGIST outperform thosetrained with traditional RL methods, other LLM-based skill acquisitiontechniques, pre-existing LLM agents across both game environments and achievescomparable performance against human players.