CoT-Valve: Length-Compressible Chain-of-Thought Tuning

Abstract

Chain-of-Thought significantly enhances a model's reasoning capability, butit also comes with a considerable increase in inference costs due to longchains. With the observation that the reasoning path can be easily compressedunder easy tasks but struggle on hard tasks, we explore the feasibility ofelastically controlling the length of reasoning paths with only one model,thereby reducing the inference overhead of reasoning models dynamically basedon task difficulty. We introduce a new tuning and inference strategy namedCoT-Valve, designed to allow models to generate reasoning chains of varyinglengths. To achieve this, we propose to identify a direction in the parameterspace that, when manipulated, can effectively control the length of generatedCoT. Moreover, we show that this property is valuable for compressing thereasoning chain. We construct datasets with chains from long to short for thesame questions and explore two enhanced strategies for CoT-Valve: (1) a preciselength-compressible CoT tuning method, and (2) a progressive chain lengthcompression approach. Our experiments show that CoT-Valve successfully enablescontrollability and compressibility of the chain and shows better performancethan the prompt-based control. We applied this method to QwQ-32B-Preview,reducing reasoning chains on GSM8K from 741 to 225 tokens with a minorperformance drop (95.07% to 94.92%) and on AIME from 6827 to 4629 tokens, withonly one additional incorrect answer.

Quick Read (beta)

loading the full paper ...