Controlling Performance and Budget of a Centralized Multi-agent LLM System with Reinforcement Learning

Abstract

Large language models (LLMs) exhibit complementary strengths across domainsand come with varying inference costs, motivating the design of multi-agent LLMsystems where specialized models collaborate efficiently. Existing approachespredominantly rely on decentralized frameworks, which invoke multiple LLMs forevery input and thus lead to substantial and uncontrolled inference costs. Inthis work, we introduce a centralized multi-LLM framework, where a controllerLLM selectively coordinates a pool of expert models in a cost-efficient andcost-controllable manner. We formulate this coordination problem asreinforcement learning with dual objectives: maximizing task performance whileminimizing the overall inference cost. In addition, we expect the multi-agentsystem to have adapted behavior with different budget conditions duringinference. To this end, we propose CoRL, a reinforcement learning frameworkthat optimizes the performance cost trade-off in a controllable multi-budgetsetting. Experiments on four diverse benchmarks demonstrate that CoRL enables asingle system to surpass the best expert LLM under high-budget settings, whilemaintaining strong performance in more economical low-budget modes,highlighting the effectiveness of centralized coordination for scalable andcost-efficient multi-agent LLM systems.

Quick Read (beta)

loading the full paper ...