KAT-V1: Kwai-AutoThink Technical Report

  • 2025-07-21 10:37:40
  • Zizheng Zhan, Ken Deng, Huaixi Tang, Wen Xiang, Kun Wu, Weihao Li, Wenqiang Zhu, Jingxuan Xu, Lecheng Huang, Zongxian Feng, Shaojie Wang, Shangpeng Yan, Xuxing Chen, Jiaheng Liu, Zhongyuan Peng, Zuchen Gao, Haoyang Huang, Xiaojiang Zhang, Jinghui Wang, Zheng Lin, Mengtong Li, Huiming Wang, Ziqi Zhan, Yanan Wu, Yuanxing Zhang, Jian Yang, Guang Chen, Haotian Zhang, Bin Chen, Bing Yu
  • 0

Abstract

We present Kwaipilot-AutoThink (KAT), an open-source 40B large language modeldeveloped to address the overthinking problem in reasoning-intensive tasks,where an automatic thinking training paradigm is proposed to dynamically switchbetween reasoning and non-reasoning modes based on task complexity.Specifically, first, we construct the dual-regime dataset based on a noveltagging pipeline and a multi-agent synthesis strategy, and then we applyMulti-Token Prediction (MTP)-enhanced knowledge distillation, enablingefficient and fine-grained reasoning transfer with minimal pretraining cost.Besides, we implement a cold-start initialization strategy that introducesmode-selection priors using majority-vote signals and intent-aware prompting.Finally, we propose Step-SRPO, a reinforcement learning algorithm thatincorporates intermediate supervision into the GRPO framework, offeringstructured guidance over both reasoning-mode selection and response accuracy.Extensive experiments across multiple benchmarks demonstrate that KATconsistently matches or even outperforms current state-of-the-art models,including DeepSeek-R1-0528 and Qwen3-235B-A22B, across a wide range ofreasoning-intensive tasks while reducing token usage. Notably, KAT outperformsall open-source models and even surpasses o3-mini on the leakage-controlledLiveCodeBench Pro. Beyond academic evaluation, KAT has been successfullydeployed in Kwaipilot (i.e., Kuaishou's internal coding assistant), where itimproves real-world development workflows with high accuracy, efficiency, andcontrollable reasoning behaviors. Moreover, we are actively training a 200BMixture-of-Experts (MoE) model with 40B active parameters, and early resultsalready show significant gains, further demonstrating the scalability of theAutoThink paradigm.

 

Quick Read (beta)

loading the full paper ...