Improving a Multi-Source Neural Machine Translation Model with Corpus Extension for Low-Resource Languages

  • 2018-02-14 05:26:45
  • Gyu-Hyeon Choi, Jong-Hun Shin, Young-Kil Kim
  • 0

Abstract

In machine translation, we often try to collect resources to improveperformance. However, most of the language pairs, such as Korean-Arabic andKorean-Vietnamese, do not have enough resources to train machine translationsystems. In this paper, we propose the use of synthetic methods for extending alow-resource corpus and apply it to a multi-source neural machine translationmodel. We showed the improvement of machine translation performance throughcorpus extension using the synthetic method. We specifically focused on how tocreate source sentences that can make better target sentences, including theuse of synthetic methods. We found that the corpus extension could also improvethe performance of multi-source neural machine translation. We showed thecorpus extension and multi-source model to be efficient methods for alow-resource language pair. Furthermore, when both methods were used together,we found better machine translation performance.

 

Quick Read (beta)

loading the full paper ...