A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages

  • 2019-11-19 06:50:59
  • Minh-Thang Luong, Preslav Nakov, Min-Yen Kan
  • 3

Abstract

We propose a language-independent approach for improving statistical machinetranslation for morphologically rich languages using a hybrid morpheme-wordrepresentation where the basic unit of translation is the morpheme, but wordboundaries are respected at all stages of the translation process. Our modelextends the classic phrase-based model by means of (1) word boundary-awaremorpheme-level phrase extraction, (2) minimum error-rate training for amorpheme-level translation model using word-level BLEU, and (3) joint scoringwith morpheme- and word-level language models. Further improvements areachieved by combining our model with the classic one. The evaluation on Englishto Finnish using Europarl (714K sentence pairs; 15.5M English words) showsstatistically significant improvements over the classic model based on BLEU andhuman judgments.

 

Quick Read (beta)

loading the full paper ...