Analyzing Roles of Classifiers and Code-Mixed factors for Sentiment Identification

  • 2018-03-15 19:31:44
  • Soumil Mandal, Dipankar Das
  • 0

Abstract

Multilingual speakers often switch between languages to express themselves onsocial communication platforms. Sometimes, the original script of the languageis preserved, while using a common script for all the languages is quitepopular as well due to convenience. On such occasions, multiple languages arebeing mixed with different rules of grammar, using the same script which makesit a challenging task for natural language processing even in case of accuratesentiment identification. In this paper, we report results of variousexperiments carried out on movie reviews dataset having this code-mixingproperty of two languages, English and Bengali, both typed in Roman script. Wehave tested various machine learning algorithms trained only on Englishfeatures on our code-mixed data and have achieved the maximum accuracy of59.00% using Naive Bayes (NB) model. We have also tested various models trainedon code-mixed data, as well as English features and the highest accuracy of72.50% was obtained by a Support Vector Machine (SVM) model. Finally, we haveanalyzed the misclassified snippets and have discussed the challenges needed tobe resolved for better accuracy.

 

Quick Read (beta)

loading the full paper ...