Optimizing Length Compression in Large Reasoning Models

Abstract

Large Reasoning Models (LRMs) have achieved remarkable success, yet theyoften suffer from producing unnecessary and verbose reasoning chains. Weidentify a core aspect of this issue as "invalid thinking" -- models tend torepeatedly double-check their work after having derived the correct answer. Toaddress this specific inefficiency, we move beyond the general principles ofEfficacy and Efficiency to propose two new, fine-grained principles: Brevity,which advocates for eliminating redundancy, and Sufficiency, which ensurescritical reasoning steps are preserved. Guided by these principles, weintroduce LC-R1, a post-training method based on Group Relative PolicyOptimization (GRPO). LC-R1 employs a novel combination of a Length Reward foroverall conciseness and a Compress Reward that is specifically designed toremove the invalid portion of the thinking process. Extensive experiments onmultiple reasoning benchmarks demonstrate that LC-R1 achieves a significantreduction in sequence length (~50%) with only a marginal (~2%) drop inaccuracy, achieving a favorable trade-off point on the Pareto frontier thatprioritizes high compression. Our analysis further validates the robustness ofLC-R1 and provides valuable insights for developing more powerful yetcomputationally efficient LRMs. Our code is released athttps://github.com/zxiangx/LC-R1.

Quick Read (beta)

loading the full paper ...