Arabic natural language processing: An overview

  • 2019-03-07 09:22:35
  • Imane Guellil, Houda Sa├ódane, Faical Azouaou, Billel Gueni, Damien Nouvel
  • 9

Abstract

Arabic is recognised as the 4th most used language of the Internet. Arabichas three main varieties: (1) classical Arabic (CA), (2) Modern Standard Arabic(MSA), (3) Arabic Dialect (AD). MSA and AD could be written either in Arabic orin Roman script (Arabizi), which corresponds to Arabic written with Latinletters, numerals and punctuation. Due to the complexity of this language andthe number of corresponding challenges for NLP, many surveys have beenconducted, in order to synthesise the work done on Arabic. However thesesurveys principally focus on two varieties of Arabic (MSA and AD, written inArabic letters only), they are slightly old (no such survey since 2015) andtherefore do not cover recent resources and tools. To bridge the gap, wepropose a survey focusing on 90 recent research papers (74% of which werepublished after 2015). Our study presents and classifies the work done on thethree varieties of Arabic, by concentrating on both Arabic and Arabizi, andassociates each work to its publicly available resources whenever available.

 

Introduction (beta)

None

 

Conclusion (beta)

None