Applying a Pre-trained Language Model to Spanish Twitter Humor Prediction

Abstract

Our entry into the HAHA 2019 Challenge placed $3^{rd}$ in the classificationtask and $2^{nd}$ in the regression task. We describe our system andinnovations, as well as comparing our results to a Naive Bayes baseline. Alarge Twitter based corpus allowed us to train a language model from scratchfocused on Spanish and transfer that knowledge to our competition model. Toovercome the inherent errors in some labels we reduce our class confidence withlabel smoothing in the loss function. All the code for our project is includedin a GitHub repository for easy reference and to enable replication by others.

Quick Read (beta)

loading the full paper ...