Abstract
LLM-based Multi-Agent Systems have demonstrated remarkable capabilities inaddressing complex, agentic tasks requiring multifaceted reasoning andcollaboration, from generating high-quality presentation slides to conductingsophisticated scientific research. Meanwhile, RL has been widely recognized forits effectiveness in enhancing agent intelligence, but limited research hasinvestigated the fine-tuning of LaMAS using foundational RL techniques.Moreover, the direct application of MARL methodologies to LaMAS introducessignificant challenges, stemming from the unique characteristics and mechanismsinherent to LaMAS. To address these challenges, this article presents acomprehensive study of LLM-based MARL and proposes a novel paradigm termedMulti-Agent Reinforcement Fine-Tuning (MARFT). We introduce a universalalgorithmic framework tailored for LaMAS, outlining the conceptual foundations,key distinctions, and practical implementation strategies. We begin byreviewing the evolution from RL to Reinforcement Fine-Tuning, setting the stagefor a parallel analysis in the multi-agent domain. In the context of LaMAS, weelucidate critical differences between MARL and MARFT. These differencesmotivate a transition toward a novel, LaMAS-oriented formulation of RFT.Central to this work is the presentation of a robust and scalable MARFTframework. We detail the core algorithm and provide a complete, open-sourceimplementation to facilitate adoption and further research. The latter sectionsof the paper explore real-world application perspectives and opening challengesin MARFT. By bridging theoretical underpinnings with practical methodologies,this work aims to serve as a roadmap for researchers seeking to advance MARFTtoward resilient and adaptive solutions in agentic systems. Our implementationof the proposed framework is publicly available at:https://github.com/jwliao-ai/MARFT.