Improving Multilingual Neural Machine Translation System for Indic Languages

  • 2022-09-27 10:51:56
  • Sudhansu Bala Das, Atharv Biradar, Tapas Kumar Mishra, Bidyut Kumar Patra
  • 3

Abstract

Machine Translation System (MTS) serves as an effective tool forcommunication by translating text or speech from one language to anotherlanguage. The need of an efficient translation system becomes obvious in alarge multilingual environment like India, where English and a set of IndianLanguages (ILs) are officially used. In contrast with English, ILs are stillentreated as low-resource languages due to unavailability of corpora. In orderto address such asymmetric nature, multilingual neural machine translation(MNMT) system evolves as an ideal approach in this direction. In this paper, wepropose a MNMT system to address the issues related to low-resource languagetranslation. Our model comprises of two MNMT systems i.e. for English-Indic(one-to-many) and the other for Indic-English (many-to-one) with a sharedencoder-decoder containing 15 language pairs (30 translation directions). Sincemost of IL pairs have scanty amount of parallel corpora, not sufficient fortraining any machine translation model. We explore various augmentationstrategies to improve overall translation quality through the proposed model. Astate-of-the-art transformer architecture is used to realize the proposedmodel. Trials over a good amount of data reveal its superiority over theconventional models. In addition, the paper addresses the use of languagerelationships (in terms of dialect, script, etc.), particularly about the roleof high-resource languages of the same family in boosting the performance oflow-resource languages. Moreover, the experimental results also show theadvantage of backtranslation and domain adaptation for ILs to enhance thetranslation quality of both source and target languages. Using all these keyapproaches, our proposed model emerges to be more efficient than the baselinemodel in terms of evaluation metrics i.e BLEU (BiLingual Evaluation Understudy)score for a set of ILs.

 

Quick Read (beta)

loading the full paper ...