Abstract
Large language models (LLMs) have shown remarkable advancements in enablinglanguage agents to tackle simple tasks. However, applying them for complex,multi-step, long-horizon tasks remains a challenge. Recent work have foundsuccess by separating high-level planning from low-level execution, whichenables the model to effectively balance high-level planning objectives andlow-level execution details. However, generating accurate plans remainsdifficult since LLMs are not inherently trained for this task. To address this,we propose Plan-and-Act, a novel framework that incorporates explicit planninginto LLM-based agents and introduces a scalable method to enhance plangeneration through a novel synthetic data generation method. Plan-and-Actconsists of a Planner model which generates structured, high-level plans toachieve user goals, and an Executor model that translates these plans intoenvironment-specific actions. To train the Planner effectively, we introduce asynthetic data generation method that annotates ground-truth trajectories withfeasible plans, augmented with diverse and extensive examples to enhancegeneralization. We evaluate Plan-and-Act using web navigation as arepresentative long-horizon planning environment, demonstrating a state-ofthe-art 54% success rate on the WebArena-Lite benchmark.