Exploring Multilingual Concepts of Human Value in Large Language Models: Is Value Alignment Consistent, Transferable and Controllable across Languages?

Abstract

Prior research in representation engineering has revealed that LLMs encodeconcepts within their representation spaces, predominantly centered aroundEnglish. In this study, we extend this philosophy to a multilingual scenario,delving into multilingual human value concepts in LLMs. Through ourcomprehensive exploration covering 7 types of human values, 16 languages and 3LLM series with distinct multilinguality, we empirically substantiate theexistence of multilingual human values in LLMs. Further cross-lingual analysison these concepts discloses 3 traits arising from language resourcedisparities: cross-lingual inconsistency, distorted linguistic relationships,and unidirectional cross-lingual transfer between high- and low-resourcelanguages, all in terms of human value concepts. Additionally, we validate thefeasibility of cross-lingual control over value alignment capabilities of LLMs,leveraging the dominant language as a source language. Drawing from ourfindings on multilingual value alignment, we prudently provide suggestions onthe composition of multilingual data for LLMs pre-training: including a limitednumber of dominant languages for cross-lingual alignment transfer whileavoiding their excessive prevalence, and keeping a balanced distribution ofnon-dominant languages. We aspire that our findings would contribute toenhancing the safety and utility of multilingual AI.

Quick Read (beta)

loading the full paper ...