Abstract
Large Language Models (LLMs) with API-calling capabilities enabled buildingeffective Language Agents (LA), while also revolutionizing the conventionaltask-oriented dialogue (TOD) paradigm. However, current approaches face acritical dilemma: TOD systems are often trained on a limited set of targetAPIs, requiring new data to maintain their quality when interfacing with newservices, while LAs are not trained to maintain user intent over multi-turnconversations. Because both robust multi-turn management and advanced functioncalling are crucial for effective conversational agents, we evaluate theseskills on three popular benchmarks: MultiWOZ 2.4 (TOD), BFCL V3 (LA), andAPI-Bank (LA), and our analyses reveal that specialized approaches excel in onedomain but underperform in the other. To bridge this chasm, we introduce CoALM(Conversational Agentic Language Model), a unified approach that integratesboth conversational and agentic capabilities. We created CoALM-IT, a carefullyconstructed multi-task dataset that interleave multi-turn ReAct reasoning withcomplex API usage. Using CoALM-IT, we train three models CoALM 8B, CoALM 70B,and CoALM 405B, which outperform top domain-specific models, including GPT-4o,across all three benchmarks. This demonstrates the feasibility of a singlemodel approach for both TOD and LA, setting a new standard for conversationalagents.