Abstract
This paper proposes a query-level meta-agent named FlowReasoner to automatethe design of query-level multi-agent systems, i.e., one system per user query.Our core idea is to incentivize a reasoning-based meta-agent via externalexecution feedback. Concretely, by distilling DeepSeek R1, we first endow thebasic reasoning ability regarding the generation of multi-agent systems toFlowReasoner. Then, we further enhance it via reinforcement learning (RL) withexternal execution feedback. A multi-purpose reward is designed to guide the RLtraining from aspects of performance, complexity, and efficiency. In thismanner, FlowReasoner is enabled to generate a personalized multi-agent systemfor each user query via deliberative reasoning. Experiments on both engineeringand competition code benchmarks demonstrate the superiority of FlowReasoner.Remarkably, it surpasses o1-mini by 10.52% accuracy across three benchmarks.The code is available at https://github.com/sail-sg/FlowReasoner.