Improving Unsupervised Word-by-Word Translation with Language Model and Denoising Autoencoder

  • 2019-01-06 18:30:50
  • Yunsu Kim, Jiahui Geng, Hermann Ney
  • 6


Unsupervised learning of cross-lingual word embedding offers elegant matchingof words across languages, but has fundamental limitations in translatingsentences. In this paper, we propose simple yet effective methods to improveword-by-word translation of cross-lingual embeddings, using only monolingualcorpora but without any back-translation. We integrate a language model forcontext-aware search, and use a novel denoising autoencoder to handlereordering. Our system surpasses state-of-the-art unsupervised neuraltranslation systems without costly iterative training. We also analyze theeffect of vocabulary size and denoising type on the translation performance,which provides better understanding of learning the cross-lingual wordembedding and its usage in translation.


Quick Read (beta)

loading the full paper ...