Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though

Abstract

We propose a novel framework, Meta Chain-of-Thought (Meta-CoT), which extendstraditional Chain-of-Thought (CoT) by explicitly modeling the underlyingreasoning required to arrive at a particular CoT. We present empirical evidencefrom state-of-the-art models exhibiting behaviors consistent with in-contextsearch, and explore methods for producing Meta-CoT via process supervision,synthetic data generation, and search algorithms. Finally, we outline aconcrete pipeline for training a model to produce Meta-CoTs, incorporatinginstruction tuning with linearized search traces and reinforcement learningpost-training. Finally, we discuss open research questions, including scalinglaws, verifier roles, and the potential for discovering novel reasoningalgorithms. This work provides a theoretical and practical roadmap to enableMeta-CoT in LLMs, paving the way for more powerful and human-like reasoning inartificial intelligence.

Quick Read (beta)

loading the full paper ...