Abstract
Recently, large reasoning models have achieved impressive performance onvarious tasks by employing human-like deep thinking. However, the lengthythinking process substantially increases inference overhead, making efficiencya critical bottleneck. In this work, we first demonstrate that NoThinking,which prompts the reasoning model to skip thinking and directly generate thefinal solution, is a better choice for relatively simple tasks in terms of bothperformance and efficiency. Motivated by this, we propose AdaptThink, a novelRL algorithm to teach reasoning models to choose the optimal thinking modeadaptively based on problem difficulty. Specifically, AdaptThink features twocore components: (1) a constrained optimization objective that encourages themodel to choose NoThinking while maintaining the overall performance; (2) animportance sampling strategy that balances Thinking and NoThinking samplesduring on-policy training, thereby enabling cold start and allowing the modelto explore and exploit both thinking modes throughout the training process. Ourexperiments indicate that AdaptThink significantly reduces the inference costswhile further enhancing performance. Notably, on three math datasets,AdaptThink reduces the average response length of DeepSeek-R1-Distill-Qwen-1.5Bby 53% and improves its accuracy by 2.4%, highlighting the promise of adaptivethinking-mode selection for optimizing the balance between reasoning qualityand efficiency. Our codes and models are available athttps://github.com/THU-KEG/AdaptThink.