Word Sense Disambiguation for 158 Languages using Word Embeddings Only

  • 2020-03-14 14:50:04
  • Varvara Logacheva, Denis Teslenko, Artem Shelmanov, Steffen Remus, Dmitry Ustalov, Andrey Kutuzov, Ekaterina Artemova, Chris Biemann, Simone Paolo Ponzetto, Alexander Panchenko
  • 0

Abstract

Disambiguation of word senses in context is easy for humans, but is a majorchallenge for automatic approaches. Sophisticated supervised andknowledge-based models were developed to solve this task. However, (i) theinherent Zipfian distribution of supervised training instances for a given wordand/or (ii) the quality of linguistic knowledge representations motivate thedevelopment of completely unsupervised and knowledge-free approaches to wordsense disambiguation (WSD). They are particularly useful for under-resourcedlanguages which do not have any resources for building either supervised and/orknowledge-based models. In this paper, we present a method that takes as inputa standard pre-trained word embedding model and induces a fully-fledged wordsense inventory, which can be used for disambiguation in context. We use thismethod to induce a collection of sense inventories for 158 languages on thebasis of the original pre-trained fastText word embeddings by Grave et al.(2018), enabling WSD in these languages. Models and system are availableonline.

 

Quick Read (beta)

loading the full paper ...