Paraphrase Thought: Sentence Embedding Module Imitating Human Language Recognition

Abstract

Sentence embedding is an important research topic in natural languageprocessing. It is essential to generate a good embedding vector that fullyreflects the semantic meaning of a sentence in order to achieve an enhancedperformance for various natural language processing tasks, such as machinetranslation and document classification. Thus far, various sentence embeddingmodels have been proposed, and their feasibility has been demonstrated throughgood performances on tasks following embedding, such as sentiment analysis andsentence classification. However, because the performances of sentenceclassification and sentiment analysis can be enhanced by using a simplesentence representation method, it is not sufficient to claim that these modelsfully reflect the meanings of sentences based on good performances for suchtasks. In this paper, inspired by human language recognition, we propose thefollowing concept of semantic coherence, which should be satisfied for a goodsentence embedding method: similar sentences should be located close to eachother in the embedding space. Then, we propose the Paraphrase-Thought(P-thought) model to pursue semantic coherence as much as possible.Experimental results on two paraphrase identification datasets (MS COCO and STSbenchmark) show that the P-thought models outperform the benchmarked sentenceembedding methods.

Quick Read (beta)

loading the full paper ...