Hallucinations and Key Information Extraction in Medical Texts: A Comprehensive Assessment of Open-Source Large Language Models

  • 2025-08-20 14:24:25
  • Anindya Bijoy Das, Shibbir Ahmed, Shahnewaz Karim Sakib
  • 0

Abstract

Clinical summarization is crucial in healthcare as it distills complexmedical data into digestible information, enhancing patient understanding andcare management. Large language models (LLMs) have shown significant potentialin automating and improving the accuracy of such summarizations due to theiradvanced natural language understanding capabilities. These models areparticularly applicable in the context of summarizing medical/clinical texts,where precise and concise information transfer is essential. In this paper, weinvestigate the effectiveness of open-source LLMs in extracting key events fromdischarge reports, including admission reasons, major in-hospital events, andcritical follow-up actions. In addition, we also assess the prevalence ofvarious types of hallucinations in the summaries produced by these models.Detecting hallucinations is vital as it directly influences the reliability ofthe information, potentially affecting patient care and treatment outcomes. Weconduct comprehensive simulations to rigorously evaluate the performance ofthese models, further probing the accuracy and fidelity of the extractedcontent in clinical summarization. Our results reveal that while the LLMs(e.g., Qwen2.5 and DeepSeek-v2) perform quite well in capturing admissionreasons and hospitalization events, they are generally less consistent when itcomes to identifying follow-up recommendations, highlighting broader challengesin leveraging LLMs for comprehensive summarization.

 

Quick Read (beta)

loading the full paper ...