Computational Approaches to Arabic-English Code-Switching

  • 2024-10-17 09:20:29
  • Caroline Sabty
  • 0

Abstract

Natural Language Processing (NLP) is a vital computational method foraddressing language processing, analysis, and generation. NLP tasks form thecore of many daily applications, from automatic text correction to speechrecognition. While significant research has focused on NLP tasks for theEnglish language, less attention has been given to Modern Standard Arabic andDialectal Arabic. Globalization has also contributed to the rise ofCode-Switching (CS), where speakers mix languages within conversations and evenwithin individual words (intra-word CS). This is especially common in Arabcountries, where people often switch between dialects or between dialects and aforeign language they master. CS between Arabic and English is frequent inEgypt, especially on social media. Consequently, a significant amount ofcode-switched content can be found online. Such code-switched data needs to beinvestigated and analyzed for several NLP tasks to tackle the challenges ofthis multilingual phenomenon and Arabic language challenges. No work has beendone before for several integral NLP tasks on Arabic-English CS data. In thiswork, we focus on the Named Entity Recognition (NER) task and other tasks thathelp propose a solution for the NER task on CS data, e.g., LanguageIdentification. This work addresses this gap by proposing and applyingstate-of-the-art techniques for Modern Standard Arabic and Arabic-English NER.We have created the first annotated CS Arabic-English corpus for the NER task.Also, we apply two enhancement techniques to improve the NER tagger on CS datausing CS contextual embeddings and data augmentation techniques. All methodsshowed improvements in the performance of the NER taggers on CS data. Finally,we propose several intra-word language identification approaches to determinethe language type of a mixed text and identify whether it is a named entity ornot.

 

Quick Read (beta)

loading the full paper ...