MathSmith: Towards Extremely Hard Mathematical Reasoning by Forging Synthetic Problems with a Reinforced Policy

Abstract

Large language models have achieved substantial progress in mathematicalreasoning, yet their advancement is limited by the scarcity of high-quality,high-difficulty training data. Existing synthesis methods largely rely ontransforming human-written templates, limiting both diversity and scalability.We propose MathSmith, a novel framework for synthesizing challengingmathematical problems to enhance LLM reasoning. Rather than modifying existingproblems, MathSmith constructs new ones from scratch by randomly samplingconcept-explanation pairs from PlanetMath, ensuring data independence andavoiding contamination. To increase difficulty, we design nine predefinedstrategies as soft constraints during rationales. We further adoptsreinforcement learning to jointly optimize structural validity, reasoningcomplexity, and answer consistency. The length of the reasoning trace generatedunder autoregressive prompting is used to reflect cognitive complexity,encouraging the creation of more demanding problems aligned withlong-chain-of-thought reasoning. Experiments across five benchmarks,categorized as easy & medium (GSM8K, MATH-500) and hard (AIME2024, AIME2025,OlympiadBench), show that MathSmith consistently outperforms existing baselinesunder both short and long CoT settings. Additionally, a weakness-focusedvariant generation module enables targeted improvement on specific concepts.Overall, MathSmith exhibits strong scalability, generalization, andtransferability, highlighting the promise of high-difficulty synthetic data inadvancing LLM reasoning capabilities.

Quick Read (beta)

loading the full paper ...