Strengthening Fake News Detection: Leveraging SVM and Sophisticated Text Vectorization Techniques. Defying BERT?

Abstract

The rapid spread of misinformation, particularly through online platforms,underscores the urgent need for reliable detection systems. This study exploresthe utilization of machine learning and natural language processing,specifically Support Vector Machines (SVM) and BERT, to detect news that arefake. We employ three distinct text vectorization methods for SVM: TermFrequency Inverse Document Frequency (TF-IDF), Word2Vec, and Bag of Words (BoW)evaluating their effectiveness in distinguishing between genuine and fake news.Additionally, we compare these methods against the transformer large languagemodel, BERT. Our comprehensive approach includes detailed preprocessing steps,rigorous model implementation, and thorough evaluation to determine the mosteffective techniques. The results demonstrate that while BERT achieves superioraccuracy with 99.98% and an F1-score of 0.9998, the SVM model with a linearkernel and BoW vectorization also performs exceptionally well, achieving 99.81%accuracy and an F1-score of 0.9980. These findings highlight that, despiteBERT's superior performance, SVM models with BoW and TF-IDF vectorizationmethods come remarkably close, offering highly competitive performance with theadvantage of lower computational requirements.

Quick Read (beta)

loading the full paper ...