Multilingual Trolley Problems for Language Models

Abstract

As large language models (LLMs) are deployed in more and more real-worldsituations, it is crucial to understand their decision-making when faced withmoral dilemmas. Inspired by a large-scale cross-cultural study of human moralpreferences, "The Moral Machine Experiment", we set up the same set of moralchoices for LLMs. We translate 1K vignettes of moral dilemmas, parametricallyvaried across key axes, into 100+ languages, and reveal the preferences of LLMsin each of these languages. We then compare the responses of LLMs to that ofhuman speakers of those languages, harnessing a dataset of 40 million humanmoral judgments. We discover that LLMs are more aligned with human preferencesin languages such as English, Korean, Hungarian, and Chinese, but less alignedin languages such as Hindi and Somali (in Africa). Moreover, we characterizethe explanations LLMs give for their moral choices and find that fairness isthe most dominant supporting reason behind GPT-4's decisions and utilitarianismby GPT-3. We also discover "language inequality" (which we define as themodel's different development levels in different languages) in a series ofmeta-properties of moral decision making.

Quick Read (beta)

loading the full paper ...