Cross-lingual Transfer of Twitter Sentiment Models Using a Common Vector Space

Abstract

Word embeddings represent words in a numeric space in such a way thatsemantic relations between words are encoded as distances and directions in thevector space. Cross-lingual word embeddings map words from one language to thevector space of another language, or words from multiple languages to the samevector space where similar words are aligned. Cross-lingual embeddings can beused to transfer machine learning models between languages and therebycompensate for insufficient data in less-resourced languages. We usecross-lingual word embeddings to transfer machine learning prediction modelsfor Twitter sentiment between 13 languages. We focus on two transfer mechanismsusing the joint numerical space for many languages as implemented in the LASERlibrary: the transfer of trained models, and expansion of training sets withinstances from other languages. Our experiments show that the transfer ofmodels between similar languages is sensible, while dataset expansion did notincrease the predictive performance.

Quick Read (beta)

loading the full paper ...