Evaluating Retrieval-Augmented Generation vs. Long-Context Input for Clinical Reasoning over EHRs

  • 2025-08-20 16:09:37
  • Skatje Myers, Dmitriy Dligach, Timothy A. Miller, Samantha Barr, Yanjun Gao, Matthew Churpek, Anoop Mayampurath, Majid Afshar
  • 0

Abstract

Electronic health records (EHRs) are long, noisy, and often redundant, posinga major challenge for the clinicians who must navigate them. Large languagemodels (LLMs) offer a promising solution for extracting and reasoning over thisunstructured text, but the length of clinical notes often exceeds evenstate-of-the-art models' extended context windows. Retrieval-augmentedgeneration (RAG) offers an alternative by retrieving task-relevant passagesfrom across the entire EHR, potentially reducing the amount of required inputtokens. In this work, we propose three clinical tasks designed to be replicableacross health systems with minimal effort: 1) extracting imaging procedures, 2)generating timelines of antibiotic use, and 3) identifying key diagnoses. UsingEHRs from actual hospitalized patients, we test three state-of-the-art LLMswith varying amounts of provided context, using either targeted text retrievalor the most recent clinical notes. We find that RAG closely matches or exceedsthe performance of using recent notes, and approaches the performance of usingthe models' full context while requiring drastically fewer input tokens. Ourresults suggest that RAG remains a competitive and efficient approach even asnewer models become capable of handling increasingly longer amounts of text.

 

Quick Read (beta)

loading the full paper ...