Abstract
Large Language Models (LLMs), when enhanced through reasoning-orientedpost-training, evolve into powerful Large Reasoning Models (LRMs).Tool-Integrated Reasoning (TIR) further extends their capabilities byincorporating external tools, but existing methods often rely on rigid,predefined tool-use patterns that risk degrading core language competence.Inspired by the human ability to adaptively select tools, we introduce AutoTIR,a reinforcement learning framework that enables LLMs to autonomously decidewhether and which tool to invoke during the reasoning process, rather thanfollowing static tool-use strategies. AutoTIR leverages a hybrid rewardmechanism that jointly optimizes for task-specific answer correctness,structured output adherence, and penalization of incorrect tool usage, therebyencouraging both precise reasoning and efficient tool integration. Extensiveevaluations across diverse knowledge-intensive, mathematical, and generallanguage modeling tasks demonstrate that AutoTIR achieves superior overallperformance, significantly outperforming baselines and exhibits superiorgeneralization in tool-use behavior. These results highlight the promise ofreinforcement learning in building truly generalizable and scalable TIRcapabilities in LLMs. The code and data are available athttps://github.com/weiyifan1023/AutoTIR.