Leaky Thoughts: Large Reasoning Models Are Not Private Thinkers

Abstract

We study privacy leakage in the reasoning traces of large reasoning modelsused as personal agents. Unlike final outputs, reasoning traces are oftenassumed to be internal and safe. We challenge this assumption by showing thatreasoning traces frequently contain sensitive user data, which can be extractedvia prompt injections or accidentally leak into outputs. Through probing andagentic evaluations, we demonstrate that test-time compute approaches,particularly increased reasoning steps, amplify such leakage. While increasingthe budget of those test-time compute approaches makes models more cautious intheir final answers, it also leads them to reason more verbosely and leak morein their own thinking. This reveals a core tension: reasoning improves utilitybut enlarges the privacy attack surface. We argue that safety efforts mustextend to the model's internal thinking, not just its outputs.

Quick Read (beta)

loading the full paper ...