Multilingual Sentiment Analysis: An RNN-Based Framework for Limited Data

Abstract

Sentiment analysis is a widely studied NLP task where the goal is todetermine opinions, emotions, and evaluations of users towards a product, anentity or a service that they are reviewing. One of the biggest challenges forsentiment analysis is that it is highly language dependent. Word embeddings,sentiment lexicons, and even annotated data are language specific. Further,optimizing models for each language is very time consuming and labor intensiveespecially for recurrent neural network models. From a resource perspective, itis very challenging to collect data for different languages. In this paper, we look for an answer to the following research question: cana sentiment analysis model trained on a language be reused for sentimentanalysis in other languages, Russian, Spanish, Turkish, and Dutch, where thedata is more limited? Our goal is to build a single model in the language withthe largest dataset available for the task, and reuse it for languages thathave limited resources. For this purpose, we train a sentiment analysis modelusing recurrent neural networks with reviews in English. We then translatereviews in other languages and reuse this model to evaluate the sentiments.Experimental results show that our robust approach of single model trained onEnglish reviews statistically significantly outperforms the baselines inseveral different languages.

Quick Read (beta)

loading the full paper ...