Abstract
As natural language becomes the default interface for human-AI interaction,there is a critical need for LMs to appropriately communicate uncertainties indownstream applications. In this work, we investigate how LMs incorporateconfidence about their responses via natural language and how downstream usersbehave in response to LM-articulated uncertainties. We examine publiclydeployed models and find that LMs are unable to express uncertainties whenanswering questions even when they produce incorrect responses. LMs can beexplicitly prompted to express confidences, but tend to be overconfident,resulting in high error rates (on average 47%) among confident responses. Wetest the risks of LM overconfidence by running human experiments and show thatusers rely heavily on LM generations, whether or not they are marked bycertainty. Lastly, we investigate the preference-annotated datasets used inRLHF alignment and find that humans have a bias against texts with uncertainty.Our work highlights a new set of safety harms facing human-LM interactions andproposes design recommendations and mitigating strategies moving forward.