ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates

  • 2025-02-10 18:51:47
  • Ling Yang, Zhaochen Yu, Bin Cui, Mengdi Wang
  • 0

Abstract

We present that hierarchical LLM reasoning via scaling thought templates caneffectively optimize the reasoning search space and outperform the mathematicalreasoning capabilities of powerful LLMs like OpenAI o1-preview and DeepSeek V3.We train our ReasonFlux-32B model with only 8 GPUs and introduces threeinnovations: (i) a structured and generic thought template library, containingaround 500 high-level thought templates capable of generalizing to similar orrelevant reasoning problems; (ii) performing hierarchical reinforcementlearning on a sequence of thought templates instead of long CoTs, optimizing abase LLM to plan out an optimal template trajectory for gradually handlingcomplex problems; (iii) a brand new inference scaling system that enableshierarchical LLM reasoning by adaptively scaling thought templates at inferencetime. With a template trajectory containing sequential thought templates, ourReasonFlux-32B significantly advances math reasoning capabilities tostate-of-the-art levels. Notably, on the MATH benchmark, it achieves anaccuracy of 91.2% and surpasses o1-preview by 6.7%. On the USA Math Olympiad(AIME) benchmark, ReasonFlux-32B solves an average of 56.7% of problems,surpassing o1-preview and DeepSeek-V3 by 27% and 45%, respectively. Code:https://github.com/Gen-Verse/ReasonFlux

 

Quick Read (beta)

loading the full paper ...