Abstract
Recent advances in deep-research agents have shown promise for autonomousknowledge construction through dynamic reasoning over external sources.However, existing approaches rely on a mono-contextual paradigm thataccumulates all information in a single, expanding context window, leading tocontext suffocation and noise contamination that limit their effectiveness onlong-horizon tasks. We introduce IterResearch, a novel iterative deep-researchparadigm that reformulates long-horizon research as a Markov Decision Processwith strategic workspace reconstruction. By maintaining an evolving report asmemory and periodically synthesizing insights, our approach preservesconsistent reasoning capacity across arbitrary exploration depths. We furtherdevelop Efficiency-Aware Policy Optimization (EAPO), a reinforcement learningframework that incentivizes efficient exploration through geometric rewarddiscounting and enables stable distributed training via adaptive downsampling.Extensive experiments demonstrate that IterResearch achieves substantialimprovements over existing open-source agents with average +14.5pp across sixbenchmarks and narrows the gap with frontier proprietary systems. Remarkably,our paradigm exhibits unprecedented interaction scaling, extending to 2048interactions with dramatic performance gains (from 3.5\% to 42.5\%), and servesas an effective prompting strategy, improving frontier models by up to 19.2ppover ReAct on long-horizon tasks. These findings position IterResearch as aversatile solution for long-horizon reasoning, effective both as a trainedagent and as a prompting paradigm for frontier models.