Language Model Alignment in Multilingual Trolley Problems

Abstract

We evaluate the moral alignment of large language models (LLMs) with humanpreferences in multilingual trolley problems. Building on the Moral Machineexperiment, which captures over 40 million human judgments across 200+countries, we develop a cross-lingual corpus of moral dilemma vignettes in over100 languages called MultiTP. This dataset enables the assessment of LLMs'decision-making processes in diverse linguistic contexts. Our analysis exploresthe alignment of 19 different LLMs with human judgments, capturing preferencesacross six moral dimensions: species, gender, fitness, status, age, and thenumber of lives involved. By correlating these preferences with the demographicdistribution of language speakers and examining the consistency of LLMresponses to various prompt paraphrasings, our findings provide insights intocross-lingual and ethical biases of LLMs and their intersection. We discoversignificant variance in alignment across languages, challenging the assumptionof uniform moral reasoning in AI systems and highlighting the importance ofincorporating diverse perspectives in AI ethics. The results underscore theneed for further research on the integration of multilingual dimensions inresponsible AI research to ensure fair and equitable AI interactions worldwide.Our code and data are at https://github.com/causalNLP/moralmachine

Quick Read (beta)

loading the full paper ...