MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning

Abstract

We introduce MAmmoTH, a series of open-source large language models (LLMs)specifically tailored for general math problem-solving. The MAmmoTH models aretrained on MathInstruct, our meticulously curated instruction tuning dataset.MathInstruct is compiled from 13 math datasets with intermediate rationales,six of which have rationales newly curated by us. It presents a unique hybridof chain-of-thought (CoT) and program-of-thought (PoT) rationales, and alsoensures extensive coverage of diverse fields in math. The hybrid of CoT and PoTnot only unleashes the potential of tool use but also allows different thoughtprocesses for different math problems. As a result, the MAmmoTH seriessubstantially outperform existing open-source models on nine mathematicalreasoning datasets across all scales with an average accuracy gain between 13%and 29%. Remarkably, our MAmmoTH-7B model reaches 35% on MATH (acompetition-level dataset), which exceeds the best open-source 7B model(WizardMath) by 25%, and the MAmmoTH-34B model achieves 46% accuracy on MATH,even surpassing GPT-4's CoT result. Our work underscores the importance ofdiverse problem coverage and the use of hybrid rationales in developingsuperior math generalist models.

Quick Read (beta)

loading the full paper ...