Event detection in Twitter: A keyword volume approach

  • 2019-01-03 01:06:55
  • Ahmad Hany Hossny, Lewis Mitchell
  • 19


Event detection using social media streams needs a set of informativefeatures with strong signals that need minimal preprocessing and are highlyassociated with events of interest. Identifying these informative features askeywords from Twitter is challenging, as people use informal language toexpress their thoughts and feelings. This informality includes acronyms,misspelled words, synonyms, transliteration and ambiguous terms. In this paper,we propose an efficient method to select the keywords frequently used inTwitter that are mostly associated with events of interest such as protests.The volume of these keywords is tracked in real time to identify the events ofinterest in a binary classification scheme. We use keywords within word-pairsto capture the context. The proposed method is to binarize vectors of dailycounts for each word-pair by applying a spike detection temporal filter, thenuse the Jaccard metric to measure the similarity of the binary vector for eachword-pair with the binary vector describing event occurrence. The top nword-pairs are used as features to classify any day to be an event or non-eventday. The selected features are tested using multiple classifiers such as NaiveBayes, SVM, Logistic Regression, KNN and decision trees. They all produced AUCROC scores up to 0.91 and F1 scores up to 0.79. The experiment is performedusing the English language in multiple cities such as Melbourne, Sydney andBrisbane as well as the Indonesian language in Jakarta. The two experiments,comprising different languages and locations, yielded similar results.


