On Learning Verifiers for Chain-of-Thought Reasoning

Abstract

Chain-of-Thought reasoning has emerged as a powerful approach for solvingcomplex mathematical and logical problems. However, it can often veer off trackthrough incorrect or unsubstantiated inferences. Formal mathematical reasoning,which can be checked with a formal verifier, is one approach to addressing thisissue. However, currently LLMs are simply not good enough to solve complexproblems in a formal way, and even just formalizing an informal problemstatement can be challenging. Motivated by this fact, in this work we considerthe problem of learning reliable verifiers for natural languageChain-of-Thought reasoning. That is, given a problem statement and step-by-stepsolution in natural language, the aim of the verifier is to output [Yes] if thereasoning steps in the solution are all valid, and [No] otherwise. In this workwe give a formal PAC-learning framework for studying this problem. We proposeand analyze several natural verification goals, at different levels ofstrength, in this framework. We provide sample complexity upper-bounds forlearning verifiers satisfying these goals, as well as lower-bound andimpossibility results for learning other natural verification objectiveswithout additional assumptions.

Quick Read (beta)

loading the full paper ...