Dict-NMT: Bilingual Dictionary based NMT for Extremely Low Resource Languages

  • 2022-06-09 13:03:29
  • Nalin Kumar, Deepak Kumar, Subhankar Mishra
  • 7

Abstract

Neural Machine Translation (NMT) models have been effective on largebilingual datasets. However, the existing methods and techniques show that themodel's performance is highly dependent on the number of examples in trainingdata. For many languages, having such an amount of corpora is a far-fetcheddream. Taking inspiration from monolingual speakers exploring new languagesusing bilingual dictionaries, we investigate the applicability of bilingualdictionaries for languages with extremely low, or no bilingual corpus. In thispaper, we explore methods using bilingual dictionaries with an NMT model toimprove translations for extremely low resource languages. We extend this workto multilingual systems, exhibiting zero-shot properties. We present a detailedanalysis of the effects of the quality of dictionaries, training dataset size,language family, etc., on the translation quality. Results on multiplelow-resource test languages show a clear advantage of our bilingualdictionary-based method over the baselines.

 

Quick Read (beta)

loading the full paper ...