MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

Abstract

Large language models (LLMs) have pushed the limits of natural languageunderstanding and exhibited excellent problem-solving ability. Despite thegreat success, most existing open-source LLMs (\eg, LLaMA-2) are still far awayfrom satisfactory for solving mathematical problem due to the complex reasoningprocedures. To bridge this gap, we propose \emph{MetaMath}, a fine-tunedlanguage model that specializes in mathematical reasoning. Specifically, westart by bootstrapping mathematical questions by rewriting the question frommultiple perspectives without extra knowledge, which results in a new datasetcalled {MetaMathQA}. Then we fine-tune the LLaMA-2 models on MetaMathQA.Experimental results on two popular benchmarks (\ie, GSM8K and MATH) formathematical reasoning demonstrate that MetaMath outperforms a suite ofopen-source LLMs by a significant margin. Our MetaMath-7B model achieves$66.4\%$ on GSM8K and $19.4\%$ on MATH, exceeding the state-of-the-art modelsof the same size by $11.5\%$ and $8.7\%$. Particularly, {MetaMath-70B} achievesan accuracy of $82.3\%$ on {GSM8K}, slightly better than {GPT-3.5-Turbo}. Werelease the {MetaMathQA} dataset, the {MetaMath} models with different modelsizes and the training code for public use.

Quick Read (beta)

loading the full paper ...