Abstract
We propose CoT-Self-Instruct, a synthetic data generation method thatinstructs LLMs to first reason and plan via Chain-of-Thought (CoT) based on thegiven seed tasks, and then to generate a new synthetic prompt of similarquality and complexity for use in LLM training, followed by filtering forhigh-quality data with automatic metrics. In verifiable reasoning, oursynthetic data significantly outperforms existing training datasets, such ass1k and OpenMathReasoning, across MATH500, AMC23, AIME24 and GPQA-Diamond. Fornon-verifiable instruction-following tasks, our method surpasses theperformance of human or standard self-instruct prompts on both AlpacaEval 2.0and Arena-Hard.
Quick Read (beta)
loading the full paper ...