Efficient Annotator Reliability Assessment and Sample Weighting for Knowledge-Based Misinformation Detection on Social Media

Abstract

Misinformation spreads rapidly on social media, confusing the truth andtargeting potentially vulnerable people. To effectively mitigate the negativeimpact of misinformation, it must first be accurately detected before applyinga mitigation strategy, such as X's community notes, which is currently a manualprocess. This study takes a knowledge-based approach to misinformationdetection, modelling the problem similarly to one of natural languageinference. The EffiARA annotation framework is introduced, aiming to utiliseinter- and intra-annotator agreement to understand the reliability of eachannotator and influence the training of large language models forclassification based on annotator reliability. In assessing the EffiARAannotation framework, the Russo-Ukrainian Conflict Knowledge-BasedMisinformation Classification Dataset (RUC-MCD) was developed and made publiclyavailable. This study finds that sample weighting using annotator reliabilityperforms the best, utilising both inter- and intra-annotator agreement andsoft-label training. The highest classification performance achieved usingLlama-3.2-1B was a macro-F1 of 0.757 and 0.740 using TwHIN-BERT-large.

Quick Read (beta)

loading the full paper ...