Quantifying Gender Bias Towards Politicians in Cross-Lingual Language Models

Abstract

While the prevalence of large pre-trained language models has led tosignificant improvements in the performance of NLP systems, recent research hasdemonstrated that these models inherit societal biases extant in naturallanguage. In this paper, we explore a simple method to probe pre-trainedlanguage models for gender bias, which we use to effect a multi-lingual studyof gender bias towards politicians. We construct a dataset of 250k politiciansfrom most countries in the world and quantify adjective and verb usage aroundthose politicians' names as a function of their gender. We conduct our study in7 languages across 6 different language modeling architectures. Our resultsdemonstrate that stance towards politicians in pre-trained language models ishighly dependent on the language used. Finally, contrary to previous findings,our study suggests that larger language models do not tend to be significantlymore gender-biased than smaller ones.

Quick Read (beta)

loading the full paper ...