Unsupervised Cross-lingual Transfer of Word Embedding Spaces

  • 2018-09-10 23:22:43
  • Ruochen Xu, Yiming Yang, Naoki Otani, Yuexin Wu
  • 4

Abstract

Cross-lingual transfer of word embeddings aims to establish the semanticmappings among words in different languages by learning the transformationfunctions over the corresponding word embedding spaces. Successfully solvingthis problem would benefit many downstream tasks such as to translate textclassification models from resource-rich languages (e.g. English) tolow-resource languages. Supervised methods for this problem rely on theavailability of cross-lingual supervision, either using parallel corpora orbilingual lexicons as the labeled data for training, which may not be availablefor many low resource languages. This paper proposes an unsupervised learningapproach that does not require any cross-lingual labeled data. Given twomonolingual word embedding spaces for any language pair, our algorithmoptimizes the transformation functions in both directions simultaneously basedon distributional matching as well as minimizing the back-translation losses.We use a neural network implementation to calculate the Sinkhorn distance, awell-defined distributional similarity measure, and optimize our objectivethrough back-propagation. Our evaluation on benchmark datasets for bilinguallexicon induction and cross-lingual word similarity prediction shows strongeror competitive performance of the proposed method compared to otherstate-of-the-art supervised and unsupervised baseline methods over manylanguage pairs.

 

Quick Read (beta)

loading the full paper ...