Abstract
In software evolution, resolving the emergent issues within GitHubrepositories is a complex challenge that involves not only the incorporation ofnew code but also the maintenance of existing functionalities. Large LanguageModels (LLMs) have shown promise in code generation and understanding but facedifficulties in code change, particularly at the repository level. To overcomethese challenges, we empirically study the reason why LLMs mostly fail toresolve GitHub issues and analyze some impact factors. Motivated by theempirical findings, we propose a novel LLM-based Multi-Agent framework forGitHub Issue reSolution, MAGIS, consisting of four kinds of agents customizedfor the software evolution: Manager, Repository Custodian, Developer, andQuality Assurance Engineer agents. This framework leverages the collaborationof various agents in the planning and coding process to unlock the potential ofLLMs to resolve GitHub issues. In experiments, we employ the SWE-benchbenchmark to compare MAGIS with popular LLMs, including GPT-3.5, GPT-4, andClaude-2. MAGIS can resolve 13.94% GitHub issues, which significantlyoutperforms the baselines. Specifically, MAGIS achieves an eight-fold increasein resolved ratio over the direct application of GPT-4, the based LLM of ourmethod. We also analyze the factors for improving GitHub issue resolutionrates, such as line location, task allocation, etc.