Abstract
Training effective AI agents for multi-turn interactions requireshigh-quality data that captures realistic human-agent dynamics, yet such datais scarce and expensive to collect manually. We introduce APIGen-MT, atwo-phase framework that generates verifiable and diverse multi-turn agentdata. In the first phase, our agentic pipeline produces detailed taskblueprints with ground-truth actions, leveraging a committee of LLM reviewersand iterative feedback loops. These blueprints are then transformed intocomplete interaction trajectories through simulated human-agent interplay. Wetrain a family of models -- the xLAM-2-fc-r series with sizes ranging from 1Bto 70B parameters. Our models outperform frontier models such as GPT-4o andClaude 3.5 on $\tau$-bench and BFCL benchmarks, with the smaller modelssurpassing their larger counterparts, particularly in multi-turn settings,while maintaining superior consistency across multiple trials. Comprehensiveexperiments demonstrate that our verified blueprint-to-details approach yieldshigh-quality training data, enabling the development of more reliable,efficient, and capable agents. We open-source both the synthetic data collectedand the trained xLAM-2-fc-r models to advance research in AI agents. Models areavailable on HuggingFace athttps://huggingface.co/collections/Salesforce/xlam-2-67ef5be12949d8dcdae354c4and project website is https://apigen-mt.github.io