Towards Machine Translation for the Kurdish Language

  • 2020-10-12 21:28:57
  • Sina Ahmadi, Mariam Masoud
Machine translation is the task of translating texts from one language toanother using computers. It has been one of the major tasks in natural languageprocessing and computational linguistics and has been motivating to facilitatehuman communication. Kurdish, an Indo-European language, has received littleattention in this realm due to the language being less-resourced. Therefore, inthis paper, we are addressing the main issues in creating a machine translationsystem for the Kurdish language, with a focus on the Sorani dialect. Wedescribe the available scarce parallel data suitable for training a neuralmachine translation model for Sorani Kurdish-English translation. We alsodiscuss some of the major challenges in Kurdish language translation anddemonstrate how fundamental text processing tasks, such as tokenization, canimprove translation performance.


