Abstract
Large Language Models (LLMs) have shown remarkable capabilities acrossvarious domains, yet they struggle with knowledge-intensive tasks in areas thatdemand factual accuracy, e.g. industrial automation and healthcare. Keylimitations include their tendency to hallucinate, lack of source traceability(provenance), and challenges in timely knowledge updates. Combining languagemodels with knowledge graphs (GraphRAG) offers promising avenues for overcomingthese deficits. However, a major challenge lies in creating such a knowledgegraph in the first place. Here, we propose a novel approach that combines LLMswith a tripartite knowledge graph representation, which is constructed byconnecting complex, domain-specific objects via a curated ontology ofcorresponding, domain-specific concepts to relevant sections within chunks oftext through a concept-anchored pre-analysis of source documents starting froman initial lexical graph. Subsequently, we formulate LLM prompt creation as anunsupervised node classification problem allowing for the optimization ofinformation density, coverage, and arrangement of LLM prompts at significantlyreduced lengths. An initial experimental evaluation of our approach on ahealthcare use case, involving multi-faceted analyses of patient anamnesesgiven a set of medical concepts as well as a series of clinical guidelineliterature, indicates its potential to optimize information density, coverage,and arrangement of LLM prompts while significantly reducing their lengths,which, in turn, may lead to reduced costs as well as more consistent andreliable LLM outputs.