TF-IDFC-RF: A Novel Supervised Term Weighting Scheme

  • 2020-03-12 21:31:46
  • Flavio Carvalho, Gustavo Paiva Guedes
Sentiment Analysis is a branch of Affective Computing usually considered abinary classification task. In this line of reasoning, Sentiment Analysis canbe applied in several contexts to classify the attitude expressed in textsamples, for example, movie reviews, sarcasm, among others. A common approachto represent text samples is the use of the Vector Space Model to computenumerical feature vectors consisting of the weight of terms. The most popularterm weighting scheme is TF-IDF (Term Frequency - Inverse Document Frequency).It is an Unsupervised Weighting Scheme (UWS) since it does not consider theclass information in the weighting of terms. Apart from that, there areSupervised Weighting Schemes (SWS), which consider the class information onterm weighting calculation. Several SWS have been recently proposed,demonstrating better results than TF-IDF. In this scenario, this work presentsa comparative study on different term weighting schemes and proposes a novelsupervised term weighting scheme, named as TF-IDFC-RF (Term Frequency - InverseDocument Frequency in Class - Relevance Frequency). The effectiveness ofTF-IDFC-RF is validated with SVM (Support Vector Machine) and NB (Naive Bayes)classifiers on four commonly used Sentiment Analysis datasets. TF-IDFC-RFoutperforms all other weighting schemes and achieves F1 results of more than99.9% on all datasets with SVM classifier.


