Abstract
Word embeddings learnt from massive text collections have demonstratedsignificant levels of discriminative biases such as gender, racial or ethnicbiases, which in turn bias the down-stream NLP applications that use those wordembeddings. Taking gender-bias as a working example, we propose a debiasingmethod that preserves non-discriminative gender-related information, whileremoving stereotypical discriminative gender biases from pre-trained wordembeddings. Specifically, we consider four types of information:\emph{feminine}, \emph{masculine}, \emph{gender-neutral} and\emph{stereotypical}, which represent the relationship between gender vs. bias,and propose a debiasing method that (a) preserves the gender-relatedinformation in feminine and masculine words, (b) preserves the neutrality ingender-neutral words, and (c) removes the biases from stereotypical words.Experimental results on several previously proposed benchmark datasets showthat our proposed method can debias pre-trained word embeddings better thanexisting SoTA methods proposed for debiasing word embeddings while preservinggender-related but non-discriminative information.