ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates

Abstract

We present that hierarchical LLM reasoning via scaling thought templates caneffectively optimize the reasoning search space and outperform the mathematicalreasoning capabilities of powerful LLMs like OpenAI o1-preview and DeepSeek V3.We train our ReasonFlux-32B model with only 8 GPUs and introduces threeinnovations: (i) a structured and generic thought template library, containingaround 500 high-level thought templates capable of generalizing to similar orrelevant reasoning problems; (ii) performing hierarchical reinforcementlearning on a sequence of thought templates instead of long CoTs, optimizing abase LLM to plan out an optimal template trajectory for gradually handlingcomplex problems; (iii) a brand new inference scaling system that enableshierarchical LLM reasoning by adaptively scaling thought templates at inferencetime. With a template trajectory containing sequential thought templates, ourReasonFlux-32B significantly advances math reasoning capabilities tostate-of-the-art levels. Notably, on the MATH benchmark, it achieves anaccuracy of 91.2% and surpasses o1-preview by 6.7%. On the USA Math Olympiad(AIME) benchmark, ReasonFlux-32B solves an average of 56.7% of problems,surpassing o1-preview and DeepSeek-V3 by 27% and 45%, respectively. Code:https://github.com/Gen-Verse/ReasonFlux

Quick Read (beta)

loading the full paper ...