Abstract
Large Reasoning Models (LRMs) often produce excessively verbose reasoningtraces, a phenomenon known as overthinking, which hampers both efficiency andinterpretability. Prior works primarily address this issue by reducing responselength, without fully examining the underlying semantic structure of thereasoning process. In this paper, we revisit overthinking by decomposing itinto two distinct forms: internal redundancy, which consists oflow-contribution reasoning steps within the first correct solution (FCS), andexternal redundancy, which refers to unnecessary continuation after the FCS. Tomitigate both forms, we propose a dual-penalty reinforcement learningframework. For internal redundancy, we adopt a sliding-window semantic analysisto penalize low-gain reasoning steps that contribute little toward reaching thecorrect answer. For external redundancy, we penalize its proportion beyond theFCS to encourage earlier termination. Our method significantly compressesreasoning traces with minimal accuracy loss, and generalizes effectively toout-of-domain tasks such as question answering and code generation. Crucially,we find that external redundancy can be safely removed without degradingperformance, whereas internal redundancy must be reduced more cautiously toavoid impairing correctness. These findings suggest that our method not onlyimproves reasoning efficiency but also enables implicit, semantic-aware controlover Chain-of-Thought length, paving the way for more concise and interpretableLRMs.