Sentiment Analysis for Sinhala Language using Deep Learning Techniques

Abstract

Due to the high impact of the fast-evolving fields of machine learning anddeep learning, Natural Language Processing (NLP) tasks have further obtainedcomprehensive performances for highly resourced languages such as English andChinese. However Sinhala, which is an under-resourced language with a richmorphology, has not experienced these advancements. For sentiment analysis,there exists only two previous research with deep learning approaches, whichfocused only on document-level sentiment analysis for the binary case. Theyexperimented with only three types of deep learning models. In contrast, thispaper presents a much comprehensive study on the use of standard sequencemodels such as RNN, LSTM, Bi-LSTM, as well as more recent state-of-the-artmodels such as hierarchical attention hybrid neural networks, and capsulenetworks. Classification is done at document-level but with more granularity byconsidering POSITIVE, NEGATIVE, NEUTRAL, and CONFLICT classes. A data set of15059 Sinhala news comments, annotated with these four classes and a corpusconsists of 9.48 million tokens are publicly released. This is the largestsentiment annotated data set for Sinhala so far.

Quick Read (beta)

loading the full paper ...