Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though

  • 2025-01-08 18:42:48
  • Violet Xiang, Charlie Snell, Kanishk Gandhi, Alon Albalak, Anikait Singh, Chase Blagden, Duy Phung, Rafael Rafailov, Nathan Lile, Dakota Mahan, Louis Castricato, Jan-Philipp Franken, Nick Haber, Chelsea Finn
  • 0

Abstract

We propose a novel framework, Meta Chain-of-Thought (Meta-CoT), which extendstraditional Chain-of-Thought (CoT) by explicitly modeling the underlyingreasoning required to arrive at a particular CoT. We present empirical evidencefrom state-of-the-art models exhibiting behaviors consistent with in-contextsearch, and explore methods for producing Meta-CoT via process supervision,synthetic data generation, and search algorithms. Finally, we outline aconcrete pipeline for training a model to produce Meta-CoTs, incorporatinginstruction tuning with linearized search traces and reinforcement learningpost-training. Finally, we discuss open research questions, including scalinglaws, verifier roles, and the potential for discovering novel reasoningalgorithms. This work provides a theoretical and practical roadmap to enableMeta-CoT in LLMs, paving the way for more powerful and human-like reasoning inartificial intelligence.

 

Quick Read (beta)

loading the full paper ...