AraBERT: Transformer-based Model for Arabic Language Understanding

  • 2020-03-30 12:34:28
  • Wissam Antoun, Fady Baly, Hazem Hajj
  • 0

Abstract

The Arabic language is a morphologically rich language with relatively fewresources and a less explored syntax compared to English. Given theselimitations, Arabic Natural Language Processing (NLP) tasks like SentimentAnalysis (SA), Named Entity Recognition (NER), and Question Answering (QA),have proven to be very challenging to tackle. Recently, with the surge oftransformers based models, language-specific BERT based models have proven tobe very efficient at language understanding, provided they are pre-trained on avery large corpus. Such models were able to set new standards and achievestate-of-the-art results for most NLP tasks. In this paper, we pre-trained BERTspecifically for the Arabic language in the pursuit of achieving the samesuccess that BERT did for the English language. The performance of AraBERT iscompared to multilingual BERT from Google and other state-of-the-artapproaches. The results showed that the newly developed AraBERT achievedstate-of-the-art performance on most tested Arabic NLP tasks. The pretrainedaraBERT models are publicly available on https://github.com/aub-mind/araberthoping to encourage research and applications for Arabic NLP.

 

Quick Read (beta)

loading the full paper ...