Exploring Multilingual Concepts of Human Value in Large Language Models: Is Value Alignment Consistent, Transferable and Controllable across Languages?

  • 2024-04-16 08:29:36
  • Shaoyang Xu, Weilong Dong, Zishan Guo, Xinwei Wu, Deyi Xiong
  • 0

Abstract

Prior research in representation engineering has revealed that LLMs encodeconcepts within their representation spaces, predominantly centered aroundEnglish. In this study, we extend this philosophy to a multilingual scenario,delving into multilingual human value concepts in LLMs. Through ourcomprehensive exploration covering 7 types of human values, 16 languages and 3LLM series with distinct multilinguality, we empirically substantiate theexistence of multilingual human values in LLMs. Further cross-lingual analysison these concepts discloses 3 traits arising from language resourcedisparities: cross-lingual inconsistency, distorted linguistic relationships,and unidirectional cross-lingual transfer between high- and low-resourcelanguages, all in terms of human value concepts. Additionally, we validate thefeasibility of cross-lingual control over value alignment capabilities of LLMs,leveraging the dominant language as a source language. Drawing from ourfindings on multilingual value alignment, we prudently provide suggestions onthe composition of multilingual data for LLMs pre-training: including a limitednumber of dominant languages for cross-lingual alignment transfer whileavoiding their excessive prevalence, and keeping a balanced distribution ofnon-dominant languages. We aspire that our findings would contribute toenhancing the safety and utility of multilingual AI.

 

Quick Read (beta)

loading the full paper ...