Abstract
Despite the recent strides in large language models, studies have underscoredthe existence of social biases within these systems. In this paper, we delveinto the validation and comparison of the ethical biases of LLMs concerningglobally discussed and potentially sensitive topics, hypothesizing that thesebiases may arise from language-specific distinctions. Introducing theMultilingual Sensitive Questions & Answers Dataset (MSQAD), we collected newsarticles from Human Rights Watch covering 17 topics, and generated sociallysensitive questions along with corresponding responses in multiple languages.We scrutinized the biases of these responses across languages and topics,employing two statistical hypothesis tests. The results showed that the nullhypotheses were rejected in most cases, indicating biases arising fromcross-language differences. It demonstrates that ethical biases in responsesare widespread across various languages, and notably, these biases wereprevalent even among different LLMs. By making the proposed MSQAD openlyavailable, we aim to facilitate future research endeavors focused on examiningcross-language biases in LLMs and their variant models.