When Two LLMs Debate, Both Think They'll Win

  • 2025-06-09 18:54:25
  • Pradyumna Shyama Prasad, Minh Nhat Nguyen
  • 0

Abstract

Can LLMs accurately adjust their confidence when facing opposition? Buildingon previous studies measuring calibration on static fact-basedquestion-answering tasks, we evaluate Large Language Models (LLMs) in adynamic, adversarial debate setting, uniquely combining two realistic factors:(a) a multi-turn format requiring models to update beliefs as new informationemerges, and (b) a zero-sum structure to control for task-related uncertainty,since mutual high-confidence claims imply systematic overconfidence. Weorganized 60 three-round policy debates among ten state-of-the-art LLMs, withmodels privately rating their confidence (0-100) in winning after each round.We observed five concerning patterns: (1) Systematic overconfidence: modelsbegan debates with average initial confidence of 72.9% vs. a rational 50%baseline. (2) Confidence escalation: rather than reducing confidence as debatesprogressed, debaters increased their win probabilities, averaging 83% by thefinal round. (3) Mutual overestimation: in 61.7% of debates, both sidessimultaneously claimed >=75% probability of victory, a logical impossibility.(4) Persistent self-debate bias: models debating identical copies increasedconfidence from 64.1% to 75.2%; even when explicitly informed their chance ofwinning was exactly 50%, confidence still rose (from 50.0% to 57.1%). (5)Misaligned private reasoning: models' private scratchpad thoughts sometimesdiffered from their public confidence ratings, raising concerns aboutfaithfulness of chain-of-thought reasoning. These results suggest LLMs lack theability to accurately self-assess or update their beliefs in dynamic,multi-turn tasks; a major concern as LLMs are now increasingly deployed withoutcareful review in assistant and agentic roles. Code for our experiments is available athttps://github.com/pradyuprasad/llms_overconfidence

 

Quick Read (beta)

loading the full paper ...