InteractComp: Evaluating Search Agents With Ambiguous Queries

Abstract

Language agents have demonstrated remarkable potential in web search andinformation retrieval. However, these search agents assume user queries arecomplete and unambiguous, an assumption that diverges from reality where usersbegin with incomplete queries requiring clarification through interaction. Yetmost agents lack interactive mechanisms during the search process, and existingbenchmarks cannot assess this capability. To address this gap, we introduceInteractComp, a benchmark designed to evaluate whether search agents canrecognize query ambiguity and actively interact to resolve it during search.Following the principle of easy to verify, interact to disambiguate, weconstruct 210 expert-curated questions across 9 domains through atarget-distractor methodology that creates genuine ambiguity resolvable onlythrough interaction. Evaluation of 17 models reveals striking failure: the bestmodel achieves only 13.73% accuracy despite 71.50% with complete context,exposing systematic overconfidence rather than reasoning deficits. Forcedinteraction produces dramatic gains, demonstrating latent capability currentstrategies fail to engage. Longitudinal analysis shows interaction capabilitiesstagnated over 15 months while search performance improved seven-fold,revealing a critical blind spot. This stagnation, coupled with the immediatefeedback inherent to search tasks, makes InteractComp a valuable resource forboth evaluating and training interaction capabilities in search agents. Thecode is available at https://github.com/FoundationAgents/InteractComp.

Quick Read (beta)

loading the full paper ...