Gender Bias in Neural Natural Language Processing

Abstract

We examine whether neural natural language processing (NLP) systems reflecthistorical biases in training data. We define a general benchmark to quantifygender bias in a variety of neural NLP tasks. Our empirical evaluation withstate-of-the-art neural coreference resolution and textbook RNN-based languagemodels trained on benchmark datasets finds significant gender bias in howmodels view occupations. We then mitigate bias with CDA: a generic methodologyfor corpus augmentation via causal interventions that breaks associationsbetween gendered and gender-neutral words. We empirically show that CDAeffectively decreases gender bias while preserving accuracy. We also explorethe space of mitigation strategies with CDA, a prior approach to word embeddingdebiasing (WED), and their compositions. We show that CDA outperforms WED,drastically so when word embeddings are trained. For pre-trained embeddings,the two methods can be effectively composed. We also find that as trainingproceeds on the original data set with gradient descent the gender bias growsas the loss reduces, indicating that the optimization encourages bias; CDAmitigates this behavior.

Quick Read (beta)

loading the full paper ...