Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning

Abstract

Large Language Models (LLMs) have demonstrated impressive capabilities acrossa wide range of NLP tasks, but they remain fundamentally stateless, constrainedby limited context windows that hinder long-horizon reasoning. Recent effortsto address this limitation often augment LLMs with an external memory bank, yetmost existing pipelines are static and heuristic-driven, lacking a learnedmechanism for deciding what to store, update, or retrieve. We presentMemory-R1, a reinforcement learning (RL) framework that equips LLMs with theability to actively manage and utilize external memory through two specializedagents: a Memory Manager that learns structured operations, including ADD,UPDATE, DELETE, and NOOP; and an Answer Agent that pre-selects and reasons overrelevant entries. Both agents are fine-tuned with outcome-driven RL (PPO andGRPO), enabling adaptive memory management with minimal supervision. With only152 training QA pairs, Memory-R1 outperforms strong baselines and generalizesacross diverse question types, three benchmarks (LoCoMo, MSC, LongMemEval), andmultiple model scales (3B-14B).

Quick Read (beta)

loading the full paper ...