Abstract
Large Language Models (LLMs) have demonstrated impressive capabilities acrossa wide range of NLP tasks, but they remain fundamentally stateless, constrainedby limited context windows that hinder long-horizon reasoning. Recent effortsto address this limitation often augment LLMs with an external memory bank, yetmost existing pipelines are static and heuristic-driven, lacking any learnedmechanism for deciding what to store, update, or retrieve. We presentMemory-R1, a reinforcement learning (RL) framework that equips LLMs with theability to actively manage and utilize external memory through two specializedagents: a Memory Manager that learns to perform structured memory operations{ADD, UPDATE, DELETE, NOOP}, and an Answer Agent that selects the most relevantentries and reasons over them to produce an answer. Both agents are fine-tunedwith outcome-driven RL (PPO and GRPO), enabling adaptive memory management anduse with minimal supervision. With as few as 152 question-answer pairs and acorresponding temporal memory bank for training, Memory-R1 outperforms the mostcompetitive existing baseline and demonstrates strong generalization acrossdiverse question types and LLM backbones. Beyond presenting an effectiveapproach, this work provides insights into how RL can unlock more agentic,memory-aware behaviors in LLMs, pointing toward richer, more persistentreasoning systems.