Assessing Gender Bias in Machine Translation -- A Case Study with Google Translate

Abstract

Recently there has been a growing concern about machine bias, where trainedstatistical models grow to reflect controversial societal asymmetries, such asgender or racial bias. A significant number of AI tools have recently beensuggested to be harmfully biased towards some minority, with reports of racistcriminal behavior predictors, Iphone X failing to differentiate between twoAsian people and Google photos' mistakenly classifying black people asgorillas. Although a systematic study of such biases can be difficult, webelieve that automated translation tools can be exploited through genderneutral languages to yield a window into the phenomenon of gender bias in AI. In this paper, we start with a comprehensive list of job positions from theU.S. Bureau of Labor Statistics (BLS) and used it to build sentences inconstructions like "He/She is an Engineer" in 12 different gender neutrallanguages such as Hungarian, Chinese, Yoruba, and several others. We translatethese sentences into English using the Google Translate API, and collectstatistics about the frequency of female, male and gender-neutral pronouns inthe translated output. We show that GT exhibits a strong tendency towards maledefaults, in particular for fields linked to unbalanced gender distributionsuch as STEM jobs. We ran these statistics against BLS' data for the frequencyof female participation in each job position, showing that GT fails toreproduce a real-world distribution of female workers. We provide experimentalevidence that even if one does not expect in principle a 50:50 pronominalgender distribution, GT yields male defaults much more frequently than whatwould be expected from demographic data alone. We are hopeful that this work will ignite a debate about the need to augmentcurrent statistical translation tools with debiasing techniques which canalready be found in the scientific literature.

Quick Read (beta)

loading the full paper ...