Effectively Controlling Reasoning Models through Thinking Intervention

Abstract

Reasoning-enhanced large language models (LLMs) explicitly generateintermediate reasoning steps prior to generating final answers, helping themodel excel in complex problem-solving. In this paper, we demonstrate that thisemerging generation framework offers a unique opportunity for more fine-grainedcontrol over model behavior. We propose Thinking Intervention, a novel paradigmdesigned to explicitly guide the internal reasoning processes of LLMs bystrategically inserting or revising specific thinking tokens. We find that theThinking Intervention paradigm enhances the capabilities of reasoning modelsacross a wide range of tasks, including instruction following on IFEval andOverthinking, instruction hierarchy on SEP, and safety alignment on XSTest andSorryBench. Our results demonstrate that Thinking Intervention significantlyoutperforms baseline prompting approaches, achieving up to 6.7% accuracy gainsin instruction-following scenarios, 15.4% improvements in reasoning aboutinstruction hierarchies, and a 40.0% increase in refusal rates for unsafeprompts using open-source DeepSeek R1 models. Overall, our work opens apromising new research avenue for controlling reasoning LLMs.

Quick Read (beta)

loading the full paper ...