A Deep Network Model for Paraphrase Detection in Short Text Messages

Abstract

This paper is concerned with paraphrase detection. The ability to detectsimilar sentences written in natural language is crucial for severalapplications, such as text mining, text summarization, plagiarism detection,authorship authentication and question answering. Given two sentences, theobjective is to detect whether they are semantically identical. An importantinsight from this work is that existing paraphrase systems perform well whenapplied on clean texts, but they do not necessarily deliver good performanceagainst noisy texts. Challenges with paraphrase detection on user generatedshort texts, such as Twitter, include language irregularity and noise. To copewith these challenges, we propose a novel deep neural network-based approachthat relies on coarse-grained sentence modeling using a convolutional neuralnetwork and a long short-term memory model, combined with a specificfine-grained word-level similarity matching model. Our experimental resultsshow that the proposed approach outperforms existing state-of-the-artapproaches on user-generated noisy social media data, such as Twitter texts,and achieves highly competitive performance on a cleaner corpus.

Quick Read (beta)

loading the full paper ...