KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation

Abstract

Pre-trained language representation models (PLMs) cannot well capture factualknowledge from text. In contrast, knowledge embedding (KE) methods caneffectively represent the relational facts in knowledge graphs (KGs) withinformative entity embeddings, but conventional KE models do not utilize therich text data. In this paper, we propose a unified model for KnowledgeEmbedding and Pre-trained LanguagE Representation (KEPLER), which can not onlybetter integrate factual knowledge into PLMs but also effectively learn KEthrough the abundant information in text. In KEPLER, we encode textualdescriptions of entities with a PLM as their embeddings, and then jointlyoptimize the KE and language modeling objectives. Experimental results showthat KEPLER achieves state-of-the-art performance on various NLP tasks, andalso works remarkably well as an inductive KE model on the link predictiontask. Furthermore, for pre-training KEPLER and evaluating the KE performance,we construct Wikidata5M, a large-scale KG dataset with aligned entitydescriptions, and benchmark state-of-the-art KE methods on it. It shall serveas a new KE benchmark and facilitate the research on large KG, inductive KE,and KG with text. The dataset can be obtained fromhttps://deepgraphlearning.github.io/project/wikidata5m.

Quick Read (beta)

loading the full paper ...