Transformer-based Language Model Fine-tuning Methods for COVID-19 Fake News Detection

Abstract

With the pandemic of COVID-19, relevant fake news is spreading all over thesky throughout the social media. Believing in them without discrimination cancause great trouble to people's life. However, universal language models mayperform weakly in these fake news detection for lack of large-scale annotateddata and sufficient semantic understanding of domain-specific knowledge. Whilethe model trained on corresponding corpora is also mediocre for insufficientlearning. In this paper, we propose a novel transformer-based language modelfine-tuning approach for these fake news detection. First, the token vocabularyof individual model is expanded for the actual semantics of professionalphrases. Second, we adapt the heated-up softmax loss to distinguish thehard-mining samples, which are common for fake news because of thedisambiguation of short text. Then, we involve adversarial training to improvethe model's robustness. Last, the predicted features extracted by universallanguage model RoBERTa and domain-specific model CT-BERT are fused by onemultiple layer perception to integrate fine-grained and high-level specificrepresentations. Quantitative experimental results evaluated on existingCOVID-19 fake news dataset show its superior performances compared to thestate-of-the-art methods among various evaluation metrics. Furthermore, thebest weighted average F1 score achieves 99.02%.

Quick Read (beta)

loading the full paper ...