Sentiment Analysis Using Aligned Word Embeddings for Uralic Languages

Abstract

In this paper, we present an approach for translating word embeddings from amajority language into 4 minority languages: Erzya, Moksha, Udmurt andKomi-Zyrian. Furthermore, we align these word embeddings and present a novelneural network model that is trained on English data to conduct sentimentanalysis and then applied on endangered language data through the aligned wordembeddings. To test our model, we annotated a small sentiment analysis corpusfor the 4 endangered languages and Finnish. Our method reached at least 56\%accuracy for each endangered language. The models and the sentiment corpus willbe released together with this paper. Our research shows that state-of-the-artneural models can be used with endangered languages with the only requirementbeing a dictionary between the endangered language and a majority language.

Quick Read (beta)

loading the full paper ...