Abstract
Large language models have made significant progress in various languagetasks, yet they still struggle with complex mathematics. In this paper, wepropose ToRA a series of Tool-integrated Reasoning Agents designed to solvechallenging mathematical problems by seamlessly integrating natural languagereasoning with the utilization of external tools (e.g., computation librariesand symbolic solvers), thereby amalgamating the analytical prowess of languageand the computational efficiency of tools. To train ToRA, we curate interactivetool-use trajectories on mathematical datasets, apply imitation learning on theannotations, and propose output space shaping to further refine models'reasoning behavior. As a result, ToRA models significantly outperformopen-source models on 10 mathematical reasoning datasets across all scales with13%-19% absolute improvements on average. Notably, ToRA-7B reaches 44.6% on thecompetition-level dataset MATH, surpassing the best open-source modelWizardMath-70B by 22% absolute. ToRA-34B is also the first open-source modelthat achieves an accuracy exceeding 50% on MATH, which significantlyoutperforms GPT-4's CoT result, and is competitive with GPT-4 solving problemswith programs. Additionally, we conduct a comprehensive analysis of thebenefits and remaining challenges of tool interaction for mathematicalreasoning, providing valuable insights for future research.