Identifying Compromised Accounts on Social Media Using Statistical Text Analysis

Abstract

Compromised social media accounts are legitimate user accounts that have beenhijacked by a third (malicious) party and can cause various kinds of damage.Early detection of such compromised accounts is very important in order tocontrol the damage. In this work we propose a novel general framework fordiscovering compromised accounts by utilizing statistical text analysis. Theframework is built on the observation that users will use language that ismeasurably different from the language that a hacker (or spammer) would use,when the account is compromised. We use the framework to develop specificalgorithms based on language modeling and use the similarity of language modelsof users and spammers as features in a supervised learning setup to identifycompromised accounts. Evaluation results on a large Twitter corpus of over 129million tweets show promising results of the proposed approach.

Quick Read (beta)

loading the full paper ...