Abstract
Large Language Models (LLMs) have demonstrated significant capabilities inunderstanding and generating human language, contributing to more naturalinteractions with complex systems. However, they face challenges such asambiguity in user requests processed by LLMs. To address these challenges, thispaper introduces and evaluates a multi-agent debate framework designed toenhance detection and resolution capabilities beyond single models. Theframework consists of three LLM architectures (Llama3-8B, Gemma2-9B, andMistral-7B variants) and a dataset with diverse ambiguities. The debateframework markedly enhanced the performance of Llama3-8B and Mistral-7Bvariants over their individual baselines, with Mistral-7B-led debates achievinga notable 76.7% success rate and proving particularly effective for complexambiguities and efficient consensus. While acknowledging varying modelresponses to collaborative strategies, these findings underscore the debateframework's value as a targeted method for augmenting LLM capabilities. Thiswork offers important insights for developing more robust and adaptive languageunderstanding systems by showing how structured debates can lead to improvedclarity in interactive systems.