ValNorm Quantifies Semantics to Reveal Consistent Valence Biases Across Languages and Over Centuries

Abstract

Word embeddings, which are numeric dictionaries for machines to processlanguage, learn implicit biases from linguistic regularities captured by wordco-occurrence information. As a result, statistical methods can detect andquantify social biases along with widely shared associations present in thecorpus the word embeddings are trained on. By extending methods that quantifyhuman-like biases in word embeddings, we introduce ValNorm, a novel wordembedding intrinsic evaluation task and a method to measure the affectivemeaning of valence (pleasantness/unpleasantness) in words, with high accuracy.The correlation between human judgment scores of valence for 399 wordscollected to establish pleasantness norms in English and ValNorm scores isr=0.88. These 399 words, obtained from the social psychology literature, areused to measure biases that are non-discriminatory among social groups. Wehypothesize that the valence associations for this set of words (in varioustranslations) are widely shared across languages and consistent over time. Weestimate valence associations of these words using word embeddings from sevenlanguages representing various language structures and from historical textcovering 200 years. Our method achieves consistently high accuracy, suggestingthat the valence associations for these words are widely shared. In contrast,we measure gender stereotypes using the same set of word embeddings and findthat social biases vary across languages. Our results signal that valenceassociations of this word set represent widely shared associations of the lasttwo centuries. Consequently, ValNorm can be used to evaluate valence norms andthe accuracy of word embeddings especially when measuring biases.

Quick Read (beta)

loading the full paper ...