Comparative Analysis of Open-Source Language Models in Summarizing Medical Text Data

Abstract

Unstructured text in medical notes and dialogues contains rich information.Recent advancements in Large Language Models (LLMs) have demonstrated superiorperformance in question answering and summarization tasks on unstructured textdata, outperforming traditional text analysis approaches. However, there is alack of scientific studies in the literature that methodically evaluate andreport on the performance of different LLMs, specifically for domain-specificdata such as medical chart notes. We propose an evaluation approach to analyzethe performance of open-source LLMs such as Llama2 and Mistral for medicalsummarization tasks, using GPT-4 as an assessor. Our innovative approach toquantitative evaluation of LLMs can enable quality control, support theselection of effective LLMs for specific tasks, and advance knowledge discoveryin digital health.

Quick Read (beta)

loading the full paper ...