Enhancing Language Learning through Technology: Introducing a New English-Azerbaijani (Arabic Script) Parallel Corpus

  • 2024-07-06 22:23:20
  • Jalil Nourmohammadi Khiarak, Ammar Ahmadi, Taher Ak-bari Saeed, Meysam Asgari-Chenaghlu, Toğrul Atabay, Mohammad Reza Baghban Karimi, Ismail Ceferli, Farzad Hasanvand, Seyed Mahboub Mousavi, Morteza Noshad
  • 0

Abstract

This paper introduces a pioneering English-Azerbaijani (Arabic Script)parallel corpus, designed to bridge the technological gap in language learningand machine translation (MT) for under-resourced languages. Consisting of548,000 parallel sentences and approximately 9 million words per language, thisdataset is derived from diverse sources such as news articles and holy texts,aiming to enhance natural language processing (NLP) applications and languageeducation technology. This corpus marks a significant step forward in the realmof linguistic resources, particularly for Turkic languages, which have laggedin the neural machine translation (NMT) revolution. By presenting the firstcomprehensive case study for the English-Azerbaijani (Arabic Script) languagepair, this work underscores the transformative potential of NMT in low-resourcecontexts. The development and utilization of this corpus not only facilitatethe advancement of machine translation systems tailored for specific linguisticneeds but also promote inclusive language learning through technology. Thefindings demonstrate the corpus's effectiveness in training deep learning MTsystems and underscore its role as an essential asset for researchers andeducators aiming to foster bilingual education and multilingual communication.This research covers the way for future explorations into NMT applications forlanguages lacking substantial digital resources, thereby enhancing globallanguage education frameworks. The Python package of our code is available athttps://pypi.org/project/chevir-kartalol/, and we also have a websiteaccessible at https://translate.kartalol.com/.

 

Quick Read (beta)

loading the full paper ...