AgentBreeder: Mitigating the AI Safety Impact of Multi-Agent Scaffolds via Self-Improvement

Abstract

Scaffolding Large Language Models (LLMs) into multi-agent systems oftenimproves performance on complex tasks, but the safety impact of such scaffoldshas not been thoroughly explored. We introduce AgentBreeder, a framework formulti-objective self-improving evolutionary search over scaffolds. We evaluatediscovered scaffolds on widely recognized reasoning, mathematics, and safetybenchmarks and compare them with popular baselines. In 'blue' mode, we see a79.4% average uplift in safety benchmark performance while maintaining orimproving capability scores. In 'red' mode, we find adversarially weakscaffolds emerging concurrently with capability optimization. Our workdemonstrates the risks of multi-agent scaffolding and provides a framework formitigating them. Code is available athttps://github.com/J-Rosser-UK/AgentBreeder.

Quick Read (beta)

loading the full paper ...