Abstract
Modern language agents must operate over long-horizon, multi-turninteractions, where they retrieve external information, adapt to observations,and answer interdependent queries. Yet, most LLM systems rely on full-contextprompting, appending all past turns regardless of their relevance. This leadsto unbounded memory growth, increased computational costs, and degradedreasoning performance on out-of-distribution input lengths. We introduce MEM1,an end-to-end reinforcement learning framework that enables agents to operatewith constant memory across long multi-turn tasks. At each turn, MEM1 updates acompact shared internal state that jointly supports memory consolidation andreasoning. This state integrates prior memory with new observations from theenvironment while strategically discarding irrelevant or redundant information.To support training in more realistic and compositional settings, we propose asimple yet effective and scalable approach to constructing multi-turnenvironments by composing existing datasets into arbitrarily complex tasksequences. Experiments across three domains, including internal retrieval QA,open-domain web QA, and multi-turn web shopping, show that MEM1-7B improvesperformance by 3.5x while reducing memory usage by 3.7x compared toQwen2.5-14B-Instruct on a 16-objective multi-hop QA task, and generalizesbeyond the training horizon. Our results demonstrate the promise ofreasoning-driven memory consolidation as a scalable alternative to existingsolutions for training long-horizon interactive agents, where both efficiencyand performance are optimized.