ParsiNLU: A Suite of Language Understanding Challenges for Persian

  • 2021-07-13 17:02:32
  • Daniel Khashabi, Arman Cohan, Siamak Shakeri, Pedram Hosseini, Pouya Pezeshkpour, Malihe Alikhani, Moin Aminnaseri, Marzieh Bitaab, Faeze Brahman, Sarik Ghazarian, Mozhdeh Gheini, Arman Kabiri, Rabeeh Karimi Mahabadi, Omid Memarrast, Ahmadreza Mosallanezhad, Erfan Noury, Shahab Raji, Mohammad Sadegh Rasooli, Sepideh Sadeghi, Erfan Sadeqi Azer, Niloofar Safi Samghabadi, Mahsa Shafaei, Saber Sheybani, Ali Tazarv, Yadollah Yaghoobzadeh
  • 0

Abstract

Despite the progress made in recent years in addressing natural languageunderstanding (NLU) challenges, the majority of this progress remains to beconcentrated on resource-rich languages like English. This work focuses onPersian language, one of the widely spoken languages in the world, and yetthere are few NLU datasets available for this rich language. The availabilityof high-quality evaluation datasets is a necessity for reliable assessment ofthe progress on different NLU tasks and domains. We introduce ParsiNLU, thefirst benchmark in Persian language that includes a range of high-level tasks-- Reading Comprehension, Textual Entailment, etc. These datasets are collectedin a multitude of ways, often involving manual annotations by native speakers.This results in over 14.5$k$ new instances across 6 distinct NLU tasks.Besides, we present the first results on state-of-the-art monolingual andmulti-lingual pre-trained language-models on this benchmark and compare themwith human performance, which provides valuable insights into our ability totackle natural language understanding challenges in Persian. We hope ParsiNLUfosters further research and advances in Persian language understanding.

 

Quick Read (beta)

loading the full paper ...