Selective Generation for Controllable Language Models

Abstract

Trustworthiness of generative language models (GLMs) is crucial in theirdeployment to critical decision making systems. Hence, certified risk controlmethods such as selective prediction and conformal prediction have been appliedto mitigating the hallucination problem in various supervised downstream tasks.However, the lack of appropriate correctness metric hinders applying suchprincipled methods to language generation tasks. In this paper, we circumventthis problem by leveraging the concept of textual entailment to evaluate thecorrectness of the generated sequence, and propose two selective generationalgorithms which control the false discovery rate with respect to the textualentailment relation (FDR-E) with a theoretical guarantee:$\texttt{SGen}^{\texttt{Sup}}$ and $\texttt{SGen}^{\texttt{Semi}}$.$\texttt{SGen}^{\texttt{Sup}}$, a direct modification of the selectiveprediction, is a supervised learning algorithm which exploitsentailment-labeled data, annotated by humans. Since human annotation is costly,we further propose a semi-supervised version, $\texttt{SGen}^{\texttt{Semi}}$,which fully utilizes the unlabeled data by pseudo-labeling, leveraging anentailment set function learned via conformal prediction. Furthermore,$\texttt{SGen}^{\texttt{Semi}}$ enables to use more general class of selectionfunctions, neuro-selection functions, and provides users with an optimalselection function class given multiple candidates. Finally, we demonstrate theefficacy of the $\texttt{SGen}$ family in achieving a desired FDR-E level withcomparable selection efficiency to those from baselines on both open and closedsource GLMs. Code and datasets are provided athttps://github.com/ml-postech/selective-generation.

Quick Read (beta)

loading the full paper ...