Understanding Tool-Integrated Reasoning

Abstract

We study why Tool-Integrated Reasoning (TIR) makes Large Language Models(LLMs) more capable. While LLMs integrated with tools like Python codeinterpreters show great promise, a principled theory explaining why thisparadigm is effective has been missing. This work provides the first formalproof that TIR fundamentally expands an LLM's capabilities. We demonstrate thattools enable a strict expansion of the model's empirical and feasible support,breaking the capability ceiling of pure-text models by unlockingproblem-solving strategies that are otherwise impossible or intractablyverbose. To guide model behavior without compromising training stability andperformance, we also introduce Advantage Shaping Policy Optimization (ASPO), anovel algorithm that directly modifies the advantage function to guide thepolicy behavior. We conduct comprehensive experiments on challengingmathematical benchmarks, leveraging a Python interpreter as the external tool.Our results show that the TIR model decisively outperforms its pure-textcounterpart on the pass@k metric. Crucially, this advantage is not confined tocomputationally-intensive problems but extends to those requiring significantabstract insight. We further identify the emergent cognitive patterns thatillustrate how models learn to think with tools. Finally, we report improvedtool usage behavior with early code invocation and much more interactive turnswith ASPO. Overall, our work provides the first principled explanation forTIR's success, shifting the focus from the mere fact that tools work to why andhow they enable more powerful reasoning.

Quick Read (beta)

loading the full paper ...