Think Only When You Need with Large Hybrid-Reasoning Models

Abstract

Recent Large Reasoning Models (LRMs) have shown substantially improvedreasoning capabilities over traditional Large Language Models (LLMs) byincorporating extended thinking processes prior to producing final responses.However, excessively lengthy thinking introduces substantial overhead in termsof token consumption and latency, which is particularly unnecessary for simplequeries. In this work, we introduce Large Hybrid-Reasoning Models (LHRMs), thefirst kind of model capable of adaptively determining whether to performthinking based on the contextual information of user queries. To achieve this,we propose a two-stage training pipeline comprising Hybrid Fine-Tuning (HFT) asa cold start, followed by online reinforcement learning with the proposedHybrid Group Policy Optimization (HGPO) to implicitly learn to select theappropriate thinking mode. Furthermore, we introduce a metric called HybridAccuracy to quantitatively assess the model's capability for hybrid thinking.Extensive experimental results show that LHRMs can adaptively perform hybridthinking on queries of varying difficulty and type. It outperforms existingLRMs and LLMs in reasoning and general capabilities while significantlyimproving efficiency. Together, our work advocates for a reconsideration of theappropriate use of extended thinking processes and provides a solid startingpoint for building hybrid thinking systems.

Quick Read (beta)

loading the full paper ...