Abstract
Large language models (LLMs) have demonstrated remarkable zero-shotgeneralization abilities: state-of-the-art chatbots can provide plausibleanswers to many common questions that arise in daily life. However, so far,LLMs cannot reliably solve long-horizon planning problems. By contrast,classical planners, once a problem is given in a formatted way, can useefficient search algorithms to quickly identify correct, or even optimal,plans. In an effort to get the best of both worlds, this paper introducesLLM+P, the first framework that incorporates the strengths of classicalplanners into LLMs. LLM+P takes in a natural language description of a planningproblem, then returns a correct (or optimal) plan for solving that problem innatural language. LLM+P does so by first converting the language descriptioninto a file written in the planning domain definition language (PDDL), thenleveraging classical planners to quickly find a solution, and then translatingthe found solution back into natural language. Along with LLM+P, we define adiverse set of different benchmark problems taken from common planningscenarios. Via a comprehensive set of experiments on these benchmark problems,we find that LLM+P is able to provide optimal solutions for most problems,while LLMs fail to provide even feasible plans for most problems.\footnote{Thecode and results are publicly available athttps://github.com/Cranial-XIX/llm-pddl.git.