APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay

  • 2025-04-08 18:46:44
  • Akshara Prabhakar, Zuxin Liu, Ming Zhu, Jianguo Zhang, Tulika Awalgaonkar, Shiyu Wang, Zhiwei Liu, Haolin Chen, Thai Hoang, Juan Carlos Niebles, Shelby Heinecke, Weiran Yao, Huan Wang, Silvio Savarese, Caiming Xiong
  • 0

Abstract

Training effective AI agents for multi-turn interactions requireshigh-quality data that captures realistic human-agent dynamics, yet such datais scarce and expensive to collect manually. We introduce APIGen-MT, atwo-phase framework that generates verifiable and diverse multi-turn agentdata. In the first phase, our agentic pipeline produces detailed taskblueprints with ground-truth actions, leveraging a committee of LLM reviewersand iterative feedback loops. These blueprints are then transformed intocomplete interaction trajectories through simulated human-agent interplay. Wetrain a family of models -- the xLAM-2-fc-r series with sizes ranging from 1Bto 70B parameters. Our models outperform frontier models such as GPT-4o andClaude 3.5 on $\tau$-bench and BFCL benchmarks, with the smaller modelssurpassing their larger counterparts, particularly in multi-turn settings,while maintaining superior consistency across multiple trials. Comprehensiveexperiments demonstrate that our verified blueprint-to-details approach yieldshigh-quality training data, enabling the development of more reliable,efficient, and capable agents. We open-source both the synthetic data collectedand the trained xLAM-2-fc-r models to advance research in AI agents. Models areavailable on HuggingFace athttps://huggingface.co/collections/Salesforce/xlam-2-67ef5be12949d8dcdae354c4and project website is https://apigen-mt.github.io

 

Quick Read (beta)

loading the full paper ...