Abstract
Reinforcement Learning (RL) has demonstrated significant potential inenhancing the reasoning capabilities of large language models (LLMs). However,the success of RL for LLMs heavily relies on human-curated datasets andverifiable rewards, which limit their scalability and generality. RecentSelf-Play RL methods, inspired by the success of the paradigm in games and Go,aim to enhance LLM reasoning capabilities without human-annotated data.However, their methods primarily depend on a grounded environment for feedback(e.g., a Python interpreter or a game engine); extending them to generaldomains remains challenging. To address these challenges, we proposeMulti-Agent Evolve (MAE), a framework that enables LLMs to self-evolve insolving diverse tasks, including mathematics, reasoning, and general knowledgeQ&A. The core design of MAE is based on a triplet of interacting agents(Proposer, Solver, Judge) that are instantiated from a single LLM, and appliesreinforcement learning to optimize their behaviors. The Proposer generatesquestions, the Solver attempts solutions, and the Judge evaluates both whileco-evolving. Experiments on Qwen2.5-3B-Instruct demonstrate that MAE achievesan average improvement of 4.54% on multiple benchmarks. These results highlightMAE as a scalable, data-efficient method for enhancing the general reasoningabilities of LLMs with minimal reliance on human-curated supervision.