Improving Unsupervised Word-by-Word Translation with Language Model and Denoising Autoencoder

Abstract

Unsupervised learning of cross-lingual word embedding offers elegant matchingof words across languages, but has fundamental limitations in translatingsentences. In this paper, we propose simple yet effective methods to improveword-by-word translation of cross-lingual embeddings, using only monolingualcorpora but without any back-translation. We integrate a language model forcontext-aware search, and use a novel denoising autoencoder to handlereordering. Our system surpasses state-of-the-art unsupervised neuraltranslation systems without costly iterative training. We also analyze theeffect of vocabulary size and denoising type on the translation performance,which provides better understanding of learning the cross-lingual wordembedding and its usage in translation.

Quick Read (beta)

loading the full paper ...