Large language models (LLMs) have recently demonstrated an impressive abilityto perform arithmetic and symbolic reasoning tasks, when provided with a fewexamples at test time ("few-shot prompting"). Much of this success can beattributed to prompting methods such as "chain-of-thought'', which employ LLMsfor both understanding the problem description by decomposing it into steps, aswell as solving each step of the problem. While LLMs seem to be adept at thissort of step-by-step decomposition, LLMs often make logical and arithmeticmistakes in the solution part, even when the problem is decomposed correctly.In this paper, we present Program-Aided Language models (PAL): a novel approachthat uses the LLM to read natural language problems and generate programs asthe intermediate reasoning steps, but offloads the solution step to a runtimesuch as a Python interpreter. With PAL, decomposing the natural languageproblem into runnable steps remains the only learning task for the LLM, whilesolving is delegated to the interpreter. We demonstrate this synergy between aneural LLM and a symbolic interpreter across 13 mathematical, symbolic, andalgorithmic reasoning tasks from BIG-Bench Hard and other benchmarks. In allthese natural language reasoning tasks, generating code using an LLM andreasoning using a Python interpreter leads to more accurate results than muchlarger models. For example, PAL using Codex achieves state-of-the-art few-shotaccuracy on the GSM8K benchmark of math word problems, surpassing PaLM-540Bwhich uses chain-of-thought by absolute 15% top-1. Our code and data arepublicly available at http://reasonwithpal.com/ .