Abstract
Large Language Models (LLMs) with chains-of-thought have demonstrated strongperformance on an increasing range of tasks, particularly those involvingcomplex logical reasoning. However, excessively long chains can lead tooverthinking, causing computational waste and slower responses. This raises aquestion: can LLMs dynamically adjust the length of their reasoning processesbased on task complexity? To address this, we propose the Think in Blocksframework, which enables adaptive reasoning-from zero to deep reasoning-bypartitioning the reasoning process into a tunable number of blocks. Our maincontributions are: (1) Establishing an explicit block-structured paradigm inwhich the model first predicts an integer reasoning budget-the number ofblocks-and then partitions its reasoning accordingly; (2) Training an adaptivemodel through a three-stage pipeline-Supervised Fine-Tuning, reward-guidedDirect Preference Optimization, and Reinforcement Learning-that adjusts itsreasoning depth to problem difficulty; (3) Exploiting the explicit block countto dynamically control reasoning depth at inference time, allowing flexibleadjustment of chain-of-thought length during deployment.