Abstract
Recent advancements in large language models (LLMs) have greatly improvedtheir capabilities on complex reasoning tasks through Long Chain-of-Thought(CoT). However, this approach often results in substantial redundancy,impairing computational efficiency and causing significant delays in real-timeapplications. To improve the efficiency, current methods often rely onhuman-defined difficulty priors, which do not align with the LLM's self-awareddifficulty, leading to inefficiencies. In this paper, we introduce the DynamicReasoning-Boundary Self-Awareness Framework (DR. SAF), which enables models todynamically assess and adjust their reasoning depth in response to problemcomplexity. DR. SAF integrates three key components: Boundary Self-AwarenessAlignment, Adaptive Reward Management, and a Boundary Preservation Mechanism.These components allow models to optimize their reasoning processes, balancingefficiency and accuracy without compromising performance. Our experimentalresults demonstrate that DR. SAF achieves a 49.27% reduction in total responsetokens with minimal loss in accuracy. The framework also delivers a 6.59x gainin token efficiency and a 5x reduction in training time, making it well-suitedto resource-limited settings. During extreme training, DR. SAF can even surpasstraditional instruction-based models in token efficiency with more than 16%accuracy improvement.