ValNorm: A New Word Embedding Intrinsic Evaluation Method Reveals Valence Biases are Consistent Across Languages and Over Decades

Abstract

Word embeddings learn implicit biases from linguistic regularities capturedby word co-occurrence information. As a result, statistical methods can detectand quantify social biases as well as widely shared associations imbibed by thecorpus the word embeddings are trained on. By extending methods that quantifyhuman-like biases in word embeddings, we introduce ValNorm, a new wordembedding intrinsic evaluation task, and the first unsupervised method thatestimates the affective meaning of valence in words with high accuracy. Thecorrelation between human scores of valence for 399 words collected toestablish pleasantness norms in English and ValNorm scores is r=0.88. These 399words, obtained from social psychology literature, are used to measure biasesthat are non-discriminatory among social groups. We hypothesize that thevalence associations for these words are widely shared across languages andconsistent over time. We estimate valence associations of these words usingword embeddings from six languages representing various language structures andfrom historical text covering 200 years. Our method achieves consistently highaccuracy, suggesting that the valence associations for these words are widelyshared. In contrast, we measure gender stereotypes using the same set of wordembeddings and find that social biases vary across languages. Our resultssignal that valence associations of this word set represent widely sharedassociations and consequently an intrinsic quality of words.

Quick Read (beta)

loading the full paper ...