Improving Biomedical Pretrained Language Models with Knowledge

Abstract

Pretrained language models have shown success in many natural languageprocessing tasks. Many works explore incorporating knowledge into languagemodels. In the biomedical domain, experts have taken decades of effort onbuilding large-scale knowledge bases. For example, the Unified Medical LanguageSystem (UMLS) contains millions of entities with their synonyms and defineshundreds of relations among entities. Leveraging this knowledge can benefit avariety of downstream tasks such as named entity recognition and relationextraction. To this end, we propose KeBioLM, a biomedical pretrained languagemodel that explicitly leverages knowledge from the UMLS knowledge bases.Specifically, we extract entities from PubMed abstracts and link them to UMLS.We then train a knowledge-aware language model that firstly applies a text-onlyencoding layer to learn entity representation and applies a text-entity fusionencoding to aggregate entity representation. Besides, we add two trainingobjectives as entity detection and entity linking. Experiments on the namedentity recognition and relation extraction from the BLURB benchmark demonstratethe effectiveness of our approach. Further analysis on a collected probingdataset shows that our model has better ability to model medical knowledge.

Quick Read (beta)

loading the full paper ...