Abstract
The last decade in deep learning has brought on increasingly capable systemsthat are deployed on a wide variety of applications. In natural languageprocessing, the field has been transformed by a number of breakthroughsincluding large language models, which are used in increasingly manyuser-facing applications. In order to reap the benefits of this technology andreduce potential harms, it is important to quantify the reliability of modelpredictions and the uncertainties that shroud their development. This thesis studies how uncertainty in natural language processing can becharacterized from a linguistic, statistical and neural perspective, and how itcan be reduced and quantified through the design of the experimental pipeline.We further explore uncertainty quantification in modeling by theoretically andempirically investigating the effect of inductive model biases in textclassification tasks. The corresponding experiments include data for threedifferent languages (Danish, English and Finnish) and tasks as well as a largeset of different uncertainty quantification approaches. Additionally, wepropose a method for calibrated sampling in natural language generation basedon non-exchangeable conformal prediction, which provides tighter token setswith better coverage of the actual continuation. Lastly, we develop an approachto quantify confidence in large black-box language models using auxiliarypredictors, where the confidence is predicted from the input to and generatedoutput text of the target model alone.