Abstract
Entity linkage (EL) is a critical problem in data cleaning and integration.In the past several decades, EL has typically been done by rule-based systemsor traditional machine learning models with hand-curated features, both ofwhich heavily depend on manual human inputs. With the ever-increasing growth ofnew data, deep learning (DL) based approaches have been proposed to alleviatethe high cost of EL associated with the traditional models. Existingexploration of DL models for EL strictly follows the well-known twin-networkarchitecture. However, we argue that the twin-network architecture issub-optimal to EL, leading to inherent drawbacks of existing models. In orderto address the drawbacks, we propose a novel and generic contrastive DLframework for EL. The proposed framework is able to capture both syntactic andsemantic matching signals and pays attention to subtle but criticaldifferences. Based on the framework, we develop a contrastive DL approach forEL, called CorDEL, with three powerful variants. We evaluate CorDEL withextensive experiments conducted on both public benchmark datasets and areal-world dataset. CorDEL outperforms previous state-of-the-art models by 5.2%on public benchmark datasets. Moreover, CorDEL yields a 2.4% improvement overthe current best DL model on the real-world dataset, while reducing the numberof training parameters by 97.6%.