Abstract
We present Kwaipilot-AutoThink (KAT), an open-source 40B large language modeldeveloped to address the overthinking problem in reasoning-intensive tasks,where an automatic thinking training paradigm is proposed to dynamically switchbetween reasoning and non-reasoning modes based on task complexity.Specifically, first, we construct the dual-regime dataset based on a noveltagging pipeline and a multi-agent synthesis strategy, and then we applyMulti-Token Prediction (MTP)-enhanced knowledge distillation, enablingefficient and fine-grained reasoning transfer with minimal pretraining cost.Besides, we implement a cold-start initialization strategy that introducesmode-selection priors using majority-vote signals and intent-aware prompting.Finally, we propose Step-SRPO, a reinforcement learning algorithm thatincorporates intermediate supervision into the GRPO framework, offeringstructured guidance over both reasoning-mode selection and response accuracy.Extensive experiments across multiple benchmarks demonstrate that KATconsistently matches or even outperforms current state-of-the-art models,including DeepSeek-R1-0528 and Qwen3-235B-A22B, across a wide range ofreasoning-intensive tasks while reducing token usage. Notably, KAT outperformsall open-source models and even surpasses o3-mini on the leakage-controlledLiveCodeBench Pro. Beyond academic evaluation, KAT has been successfullydeployed in Kwaipilot (i.e., Kuaishou's internal coding assistant), where itimproves real-world development workflows with high accuracy, efficiency, andcontrollable reasoning behaviors. Moreover, we are actively training a 200BMixture-of-Experts (MoE) model with 40B active parameters, and early resultsalready show significant gains, further demonstrating the scalability of theAutoThink paradigm.