RadAlign: Advancing Radiology Report Generation with Vision-Language Concept Alignment

  • 2025-07-22 17:16:32
  • Difei Gu, Yunhe Gao, Yang Zhou, Mu Zhou, Dimitris Metaxas
  • 0

Abstract

Automated chest radiographs interpretation requires both accurate diseaseclassification and detailed radiology report generation, presenting asignificant challenge in the clinical workflow. Current approaches either focuson classification accuracy at the expense of interpretability or generatedetailed but potentially unreliable reports through image captioningtechniques. In this study, we present RadAlign, a novel framework that combinesthe predictive accuracy of vision-language models (VLMs) with the reasoningcapabilities of large language models (LLMs). Inspired by the radiologist'sworkflow, RadAlign first employs a specialized VLM to align visual featureswith key medical concepts, achieving superior disease classification with anaverage AUC of 0.885 across multiple diseases. These recognized medicalconditions, represented as text-based concepts in the aligned visual-languagespace, are then used to prompt LLM-based report generation. Enhanced by aretrieval-augmented generation mechanism that grounds outputs in similarhistorical cases, RadAlign delivers superior report quality with a GREEN scoreof 0.678, outperforming state-of-the-art methods' 0.634. Our frameworkmaintains strong clinical interpretability while reducing hallucinations,advancing automated medical imaging and report analysis through integratedpredictive and generative AI. Code is available athttps://github.com/difeigu/RadAlign.

 

Quick Read (beta)

loading the full paper ...