Quantifying Gender Bias Towards Politicians in Cross-Lingual Language Models

Abstract

Recent research has demonstrated that large pre-trained language modelsreflect societal biases expressed in natural language. The present paperintroduces a simple method for probing language models to conduct amultilingual study of gender bias towards politicians. We quantify the usage ofadjectives and verbs generated by language models surrounding the names ofpoliticians as a function of their gender. To this end, we curate a dataset of250k politicians worldwide, including their names and gender. Our study isconducted in seven languages across six different language modelingarchitectures. The results demonstrate that pre-trained language models' stancetowards politicians varies strongly across analyzed languages. We find thatwhile some words such as dead, and designated are associated with both male andfemale politicians, a few specific words such as beautiful and divorced arepredominantly associated with female politicians. Finally, and contrary toprevious findings, our study suggests that larger language models do not tendto be significantly more gender-biased than smaller ones.

Quick Read (beta)

loading the full paper ...