Learning and Evaluating Emotion Lexicons for 91 Languages

  • 2020-05-12 10:32:03
  • Sven Buechel, Susanna Rücker, Udo Hahn
  • 3

Abstract

Emotion lexicons describe the affective meaning of words and thus constitutea centerpiece for advanced sentiment and emotion analysis. Yet, manuallycurated lexicons are only available for a handful of languages, leaving mostlanguages of the world without such a precious resource for downstreamapplications. Even worse, their coverage is often limited both in terms of thelexical units they contain and the emotional variables they feature. In orderto break this bottleneck, we here introduce a methodology for creating almostarbitrarily large emotion lexicons for any target language. Our approachrequires nothing but a source language emotion lexicon, a bilingual wordtranslation model, and a target language embedding model. Fulfilling theserequirements for 91 languages, we are able to generate representationally richhigh-coverage lexicons comprising eight emotional variables with more than 100klexical entries each. We evaluated the automatically generated lexicons againsthuman judgment from 26 datasets, spanning 12 typologically diverse languages,and found that our approach produces results in line with state-of-the-artmonolingual approaches to lexicon creation and even surpasses human reliabilityfor some languages and variables. Code and data are available athttps://github.com/JULIELab/MEmoLon archived under DOIhttps://doi.org/10.5281/zenodo.3779901.

 

Quick Read (beta)

loading the full paper ...