Extract and Edit: An Alternative to Back-Translation for Unsupervised Neural Machine Translation

Abstract

The overreliance on large parallel corpora significantly limits theapplicability of machine translation systems to the majority of language pairs.Back-translation has been dominantly used in previous approaches forunsupervised neural machine translation, where pseudo sentence pairs aregenerated to train the models with a reconstruction loss. However, the pseudosentences are usually of low quality as translation errors accumulate duringtraining. To avoid this fundamental issue, we propose an alternative but moreeffective approach, extract-edit, to extract and then edit real sentences fromthe target monolingual corpora. Furthermore, we introduce a comparativetranslation loss to evaluate the translated target sentences and thus train theunsupervised translation systems. Experiments show that the proposed approachconsistently outperforms the previous state-of-the-art unsupervised machinetranslation systems across two benchmarks (English-French and English-German)and two low-resource language pairs (English-Romanian and English-Russian) bymore than 2 (up to 3.63) BLEU points.

Quick Read (beta)

loading the full paper ...