Abstract
Language models with billions of parameters exhibit in-context learningabilities, enabling few-shot learning on tasks that the model was notspecifically trained for. Traditional models achieve breakthrough performanceon language tasks, but do not perform well on basic reasoning benchmarks.However, a new in-context learning approach, Chain-of-thought, has demonstratedstrong multi-step reasoning abilities on these benchmarks. The research on LLM reasoning abilities started with the question whetherLLMs can solve grade school math word problems, and has expanded to other tasksin the past few years. This paper reviews the field of multi-step reasoningwith LLMs. We propose a taxonomy that identifies different ways to generate,evaluate, and control multi-step reasoning. We provide an in-depth coverage ofcore approaches and open problems, and we propose a research agenda for thenear future. We find that multi-step reasoning approaches have progressed beyond math wordproblems, and can now successfully solve challenges in logic, combinatorialgames, and robotics, sometimes by first generating code that is then executedby external tools. Many studies in multi-step methods are using reinforcementlearning for finetuning, external optimization loops, in context reinforcementlearning, and self-reflection.