ERNIE: Enhanced Language Representation with Informative Entities

Abstract

Neural language representation models such as BERT pre-trained on large-scalecorpora can well capture rich semantic patterns from plain text, and befine-tuned to consistently improve the performance of various NLP tasks.However, the existing pre-trained language models rarely consider incorporatingknowledge graphs (KGs), which can provide rich structured knowledge facts forbetter language understanding. We argue that informative entities in KGs canenhance language representation with external knowledge. In this paper, weutilize both large-scale textual corpora and KGs to train an enhanced languagerepresentation model (ERNIE), which can take full advantage of lexical,syntactic, and knowledge information simultaneously. The experimental resultshave demonstrated that ERNIE achieves significant improvements on variousknowledge-driven tasks, and meanwhile is comparable with the state-of-the-artmodel BERT on other common NLP tasks. The source code of this paper can beobtained from https://github.com/thunlp/ERNIE.

Quick Read (beta)

loading the full paper ...