Sentiment Classification of Customer Reviews about Automobiles in Roman Urdu

Abstract

Text mining is a broad field having sentiment mining as its importantconstituent in which we try to deduce the behavior of people towards a specificitem, merchandise, politics, sports, social media comments, review sites etc.Out of many issues in sentiment mining, analysis and classification, one majorissue is that the reviews and comments can be in different languages likeEnglish, Arabic, Urdu etc. Handling each language according to its rules is adifficult task. A lot of research work has been done in English Language forsentiment analysis and classification but limited sentiment analysis work isbeing carried out on other regional languages like Arabic, Urdu and Hindi. Inthis paper, Waikato Environment for Knowledge Analysis (WEKA) is used as aplatform to execute different classification models for text classification ofRoman Urdu text. Reviews dataset has been scrapped from different automobilessites. These extracted Roman Urdu reviews, containing 1000 positive and 1000negative reviews, are then saved in WEKA attribute-relation file format (arff)as labeled examples. Training is done on 80% of this data and rest of it isused for testing purpose which is done using different models and results areanalyzed in each case. The results show that Multinomial Naive Bayesoutperformed Bagging, Deep Neural Network, Decision Tree, Random Forest,AdaBoost, k-NN and SVM Classifiers in terms of more accuracy, precision, recalland F-measure.

Quick Read (beta)

loading the full paper ...