The interplay between language similarity and script on a novel multi-layer Algerian dialect corpus

  • 2021-05-31 14:45:06
  • Samia Touileb, Jeremy Barnes
  • 0

Abstract

Recent years have seen a rise in interest for cross-lingual transfer betweenlanguages with similar typology, and between languages of various scripts.However, the interplay between language similarity and difference in script oncross-lingual transfer is a less studied problem. We explore this interplay oncross-lingual transfer for two supervised tasks, namely part-of-speech taggingand sentiment analysis. We introduce a newly annotated corpus of Algerianuser-generated comments comprising parallel annotations of Algerian written inLatin, Arabic, and code-switched scripts, as well as annotations for sentimentand topic categories. We perform baseline experiments by fine-tuningmulti-lingual language models. We further explore the effect of script vs.language similarity in cross-lingual transfer by fine-tuning multi-lingualmodels on languages which are a) typologically distinct, but use the samescript, b) typologically similar, but use a distinct script, or c) aretypologically similar and use the same script. We find there is a delicaterelationship between script and typology for part-of-speech, while sentimentanalysis is less sensitive.

 

Quick Read (beta)

loading the full paper ...