KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation

Abstract

Pre-trained language representation models (PLMs) learn effective languagerepresentations from large-scale unlabeled corpora. Knowledge embedding (KE)algorithms encode the entities and relations in knowledge graphs intoinformative embeddings to do knowledge graph completion and provide externalknowledge for various NLP applications. In this paper, we propose a unifiedmodel for Knowledge Embedding and Pre-trained LanguagE Representation (KEPLER),which not only better integrates factual knowledge into PLMs but alsoeffectively learns knowledge graph embeddings. Our KEPLER utilizes a PLM toencode textual descriptions of entities as their entity embeddings, and thenjointly learn the knowledge embeddings and language representations.Experimental results on various NLP tasks such as the relation extraction andthe entity typing show that our KEPLER can achieve comparable results to thestate-of-the-art knowledge-enhanced PLMs without any additional inferenceoverhead. Furthermore, we construct Wikidata5m, a new large-scale knowledgegraph dataset with aligned text descriptions, to evaluate KE embedding methodsin both the traditional transductive setting and the challenging inductivesetting, which needs the models to predict entity embeddings for unseenentities. Experiments demonstrate our KEPLER can achieve good results in bothsettings.

Quick Read (beta)

loading the full paper ...