A Bilingual Generative Transformer for Semantic Sentence Embedding

Abstract

Semantic sentence embedding models encode natural language sentences intovectors, such that closeness in embedding space indicates closeness in thesemantics between the sentences. Bilingual data offers a useful signal forlearning such embeddings: properties shared by both sentences in a translationpair are likely semantic, while divergent properties are likely stylistic orlanguage-specific. We propose a deep latent variable model that attempts toperform source separation on parallel sentences, isolating what they have incommon in a latent semantic vector, and explaining what is left over withlanguage-specific latent vectors. Our proposed approach differs from past workon semantic sentence encoding in two ways. First, by using a variationalprobabilistic framework, we introduce priors that encourage source separation,and can use our model's posterior to predict sentence embeddings formonolingual data at test time. Second, we use high-capacity transformers asboth data generating distributions and inference networks -- contrasting withmost past work on sentence embeddings. In experiments, our approachsubstantially outperforms the state-of-the-art on a standard suite ofunsupervised semantic similarity evaluations. Further, we demonstrate that ourapproach yields the largest gains on more difficult subsets of theseevaluations where simple word overlap is not a good indicator of similarity.

Quick Read (beta)

loading the full paper ...