Comparing Moral Values in Western English-speaking societies and LLMs with Word Associations

Abstract

As the impact of large language models increases, understanding the moralvalues they reflect becomes ever more important. Assessing the nature of moralvalues as understood by these models via direct prompting is challenging due topotential leakage of human norms into model training data, and theirsensitivity to prompt formulation. Instead, we propose to use wordassociations, which have been shown to reflect moral reasoning in humans, aslow-level underlying representations to obtain a more robust picture of LLMs'moral reasoning. We study moral differences in associations from westernEnglish-speaking communities and LLMs trained predominantly on English data.First, we create a large dataset of LLM-generated word associations, resemblingan existing data set of human word associations. Next, we propose a novelmethod to propagate moral values based on seed words derived from MoralFoundation Theory through the human and LLM-generated association graphs.Finally, we compare the resulting moral conceptualizations, highlightingdetailed but systematic differences between moral values emerging from Englishspeakers and LLM associations.

Quick Read (beta)

loading the full paper ...