Fishing for Clickbaits in Social Images and Texts with Linguistically-Infused Neural Network Models

Abstract

This paper presents the results and conclusions of our participation in theClickbait Challenge 2017 on automatic clickbait detection in social media. Wefirst describe linguistically-infused neural network models and identifyinformative representations to predict the level of clickbaiting present inTwitter posts. Our models allow to answer the question not only whether a postis a clickbait or not, but to what extent it is a clickbait post e.g., not atall, slightly, considerably, or heavily clickbaity using a score ranging from 0to 1. We evaluate the predictive power of models trained on varied text andimage representations extracted from tweets. Our best performing model thatrelies on the tweet text and linguistic markers of biased language extractedfrom the tweet and the corresponding page yields mean squared error (MSE) of0.04, mean absolute error (MAE) of 0.16 and R2 of 0.43 on the held-out testdata. For the binary classification setup (clickbait vs. non-clickbait), ourmodel achieved F1 score of 0.69. We have not found that image representationscombined with text yield significant performance improvement yet. Nevertheless,this work is the first to present preliminary analysis of objects extractedusing Google Tensorflow object detection API from images in clickbait vs.non-clickbait Twitter posts. Finally, we outline several steps to improve modelperformance as a part of the future work.

Quick Read (beta)

loading the full paper ...