Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them

Abstract

Word embeddings are widely used in NLP for a vast range of tasks. It wasshown that word embeddings derived from text corpora reflect gender biases insociety. This phenomenon is pervasive and consistent across different wordembedding models, causing serious concern. Several recent works tackle thisproblem, and propose methods for significantly reducing this gender bias inword embeddings, demonstrating convincing results. However, we argue that thisremoval is superficial. While the bias is indeed substantially reducedaccording to the provided bias definition, the actual effect is mostly hidingthe bias, not removing it. The gender bias information is still reflected inthe distances between "gender-neutralized" words in the debiased embeddings,and can be recovered from them. We present a series of experiments to supportthis claim, for two debiasing methods. We conclude that existing bias removaltechniques are insufficient, and should not be trusted for providinggender-neutral modeling.

Quick Read (beta)

loading the full paper ...