IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction

  • 2025-11-10 17:30:08
  • Guoxin Chen, Zile Qiao, Xuanzhong Chen, Donglei Yu, Haotian Xu, Wayne Xin Zhao, Ruihua Song, Wenbiao Yin, Huifeng Yin, Liwen Zhang, Kuan Li, Minpeng Liao, Yong Jiang, Pengjun Xie, Fei Huang, Jingren Zhou
  • 0

Abstract

Recent advances in deep-research agents have shown promise for autonomousknowledge construction through dynamic reasoning over external sources.However, existing approaches rely on a mono-contextual paradigm thataccumulates all information in a single, expanding context window, leading tocontext suffocation and noise contamination that limit their effectiveness onlong-horizon tasks. We introduce IterResearch, a novel iterative deep-researchparadigm that reformulates long-horizon research as a Markov Decision Processwith strategic workspace reconstruction. By maintaining an evolving report asmemory and periodically synthesizing insights, our approach preservesconsistent reasoning capacity across arbitrary exploration depths. We furtherdevelop Efficiency-Aware Policy Optimization (EAPO), a reinforcement learningframework that incentivizes efficient exploration through geometric rewarddiscounting and enables stable distributed training via adaptive downsampling.Extensive experiments demonstrate that IterResearch achieves substantialimprovements over existing open-source agents with average +14.5pp across sixbenchmarks and narrows the gap with frontier proprietary systems. Remarkably,our paradigm exhibits unprecedented interaction scaling, extending to 2048interactions with dramatic performance gains (from 3.5\% to 42.5\%), and servesas an effective prompting strategy, improving frontier models by up to 19.2ppover ReAct on long-horizon tasks. These findings position IterResearch as aversatile solution for long-horizon reasoning, effective both as a trainedagent and as a prompting paradigm for frontier models.

 

Quick Read (beta)

loading the full paper ...