Abstract
Lifelong learning is essential for intelligent agents operating in dynamicenvironments. Current large language model (LLM)-based agents, however, remainstateless and unable to accumulate or transfer knowledge over time. Existingbenchmarks treat agents as static systems and fail to evaluate lifelonglearning capabilities. We present LifelongAgentBench, the first unifiedbenchmark designed to systematically assess the lifelong learning ability ofLLM agents. It provides skill-grounded, interdependent tasks across threeinteractive environments, Database, Operating System, and Knowledge Graph, withautomatic label verification, reproducibility, and modular extensibility.Extensive experiments reveal that conventional experience replay has limitedeffectiveness for LLM agents due to irrelevant information and context lengthconstraints. We further introduce a group self-consistency mechanism thatsignificantly improves lifelong learning performance. We hopeLifelongAgentBench will advance the development of adaptive, memory-capable LLMagents.