Bayesian Preference Elicitation with Language Models

Abstract

Aligning AI systems to users' interests requires understanding andincorporating humans' complex values and preferences. Recently, language models(LMs) have been used to gather information about the preferences of humanusers. This preference data can be used to fine-tune or guide other LMs and/orAI systems. However, LMs have been shown to struggle with crucial aspects ofpreference learning: quantifying uncertainty, modeling human mental states, andasking informative questions. These challenges have been addressed in otherareas of machine learning, such as Bayesian Optimal Experimental Design (BOED),which focus on designing informative queries within a well-defined featurespace. But these methods, in turn, are difficult to scale and apply toreal-world problems where simply identifying the relevant features can bedifficult. We introduce OPEN (Optimal Preference Elicitation with Naturallanguage) a framework that uses BOED to guide the choice of informativequestions and an LM to extract features and translate abstract BOED queriesinto natural language questions. By combining the flexibility of LMs with therigor of BOED, OPEN can optimize the informativity of queries while remainingadaptable to real-world domains. In user studies, we find that OPEN outperformsexisting LM- and BOED-based methods for preference elicitation.

Quick Read (beta)

loading the full paper ...