Abstract
Accurate confidence calibration in Large Language Models (LLMs) is criticalfor safe use in high-stakes domains, where clear verbalized confidence enhancesuser trust. Traditional methods that mimic reference confidence expressionsoften fail to capture the reasoning needed for accurate confidence assessment.We propose natural language critiques as a solution, ideally suited forconfidence calibration, as precise gold confidence labels are hard to obtainand often require multiple generations. This paper studies how natural languagecritiques can enhance verbalized confidence, addressing: (1) What to critique:uncertainty (question-focused) or confidence (answer-specific)? Analysis showsconfidence suits multiple-choice tasks, while uncertainty excels in open-endedscenarios. (2) How to critique: self-critique or critique calibration training?We propose Self-Critique, enabling LLMs to critique and optimize theirconfidence beyond mere accuracy, and CritiCal, a novel Critique Calibrationtraining method that leverages natural language critiques to improve confidencecalibration, moving beyond direct numerical optimization. Experiments show thatCritiCal significantly outperforms Self-Critique and other competitivebaselines, even surpassing its teacher model, GPT-4o, in complex reasoningtasks. CritiCal also shows robust generalization in out-of-distributionsettings, advancing LLM's reliability.