Ngambay-French Neural Machine Translation (sba-Fr)

  • 2023-08-25 18:13:20
  • Sakayo Toadoum Sari, Angela Fan, Lema Logamou Seknewna
  • 0

Abstract

In Africa, and the world at large, there is an increasing focus on developingNeural Machine Translation (NMT) systems to overcome language barriers. NMT forLow-resource language is particularly compelling as it involves learning withlimited labelled data. However, obtaining a well-aligned parallel corpus forlow-resource languages can be challenging. The disparity between thetechnological advancement of a few global languages and the lack of research onNMT for local languages in Chad is striking. End-to-end NMT trials onlow-resource Chad languages have not been attempted. Additionally, there is adearth of online and well-structured data gathering for research in NaturalLanguage Processing, unlike some African languages. However, a guided approachfor data gathering can produce bitext data for many Chadian languagetranslation pairs with well-known languages that have ample data. In thisproject, we created the first sba-Fr Dataset, which is a corpus ofNgambay-to-French translations, and fine-tuned three pre-trained models usingthis dataset. Our experiments show that the M2M100 model outperforms othermodels with high BLEU scores on both original and original+synthetic data. Thepublicly available bitext dataset can be used for research purposes.

 

Quick Read (beta)

loading the full paper ...