Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction

Abstract

The current paradigm of test-time scaling relies on generating long reasoningtraces ("thinking" more) before producing a response. In agent problems thatrequire interaction, this can be done by generating thinking traces beforeacting in the world. However, this process does not allow agents to acquire newinformation from the environment or adapt their behavior over time. In thiswork, we propose to scale test-time interaction, an untapped dimension oftest-time scaling that increases the agent's interaction horizon to enablerunning rich behaviors such as exploration, backtracking, and dynamicre-planning within a single rollout. To demonstrate the promise of this scalingdimension, we study the domain of web agents. We first show that evenprompting-based interaction scaling without any training can improve tasksuccess on web benchmarks non-trivially. Building on this, we introduce TTI(Test-Time Interaction), a curriculum-based online reinforcement learning (RL)approach that trains agents by adaptively adjusting their rollout lengths.Using a Gemma 3 12B model, TTI produces state-of-the-art open-source, open-dataweb agents on WebVoyager and WebArena benchmarks. We further show that TTIenables agents to balance exploration and exploitation adaptively. Our resultsestablish interaction scaling as a powerful, complementary axis to scalingper-step compute, offering new avenues for training adaptive agents.

Quick Read (beta)

loading the full paper ...