Language Models sounds the Death Knell of Knowledge Graphs

Abstract

Healthcare domain generates a lot of unstructured and semi-structured text.Natural Language processing (NLP) has been used extensively to process thisdata. Deep Learning based NLP especially Large Language Models (LLMs) such asBERT have found broad acceptance and are used extensively for manyapplications. A Language Model is a probability distribution over a wordsequence. Self-supervised Learning on a large corpus of data automaticallygenerates deep learning-based language models. BioBERT and Med-BERT arelanguage models pre-trained for the healthcare domain. Healthcare uses typicalNLP tasks such as question answering, information extraction, named entityrecognition, and search to simplify and improve processes. However, to ensurerobust application of the results, NLP practitioners need to normalize andstandardize them. One of the main ways of achieving normalization andstandardization is the use of Knowledge Graphs. A Knowledge Graph capturesconcepts and their relationships for a specific domain, but their creation istime-consuming and requires manual intervention from domain experts, which canprove expensive. SNOMED CT (Systematized Nomenclature of Medicine -- ClinicalTerms), Unified Medical Language System (UMLS), and Gene Ontology (GO) arepopular ontologies from the healthcare domain. SNOMED CT and UMLS captureconcepts such as disease, symptoms and diagnosis and GO is the world's largestsource of information on the functions of genes. Healthcare has been dealingwith an explosion in information about different types of drugs, diseases, andprocedures. This paper argues that using Knowledge Graphs is not the bestsolution for solving problems in this domain. We present experiments using LLMsfor the healthcare domain to demonstrate that language models provide the samefunctionality as knowledge graphs, thereby making knowledge graphs redundant.

Quick Read (beta)

loading the full paper ...