Evaluating Vision Language Model Adaptations for Radiology Report Generation in Low-Resource Languages

  • 2025-05-02 09:14:03
  • Marco Salmè, Rosa Sicilia, Paolo Soda, Valerio Guarrasi
  • 0

Abstract

The integration of artificial intelligence in healthcare has opened newhorizons for improving medical diagnostics and patient care. However,challenges persist in developing systems capable of generating accurate andcontextually relevant radiology reports, particularly in low-resourcelanguages. In this study, we present a comprehensive benchmark to evaluate theperformance of instruction-tuned Vision-Language Models (VLMs) in thespecialized task of radiology report generation across three low-resourcelanguages: Italian, German, and Spanish. Employing the LLaVA architecturalframework, we conducted a systematic evaluation of pre-trained models utilizinggeneral datasets, domain-specific datasets, and low-resource language-specificdatasets. In light of the unavailability of models that possess prior knowledgeof both the medical domain and low-resource languages, we analyzed variousadaptations to determine the most effective approach for these contexts. Theresults revealed that language-specific models substantially outperformed bothgeneral and domain-specific models in generating radiology reports, emphasizingthe critical role of linguistic adaptation. Additionally, models fine-tunedwith medical terminology exhibited enhanced performance across all languagescompared to models with generic knowledge, highlighting the importance ofdomain-specific training. We also explored the influence of the temperatureparameter on the coherence of report generation, providing insights for optimalmodel settings. Our findings highlight the importance of tailored language anddomain-specific training for improving the quality and accuracy of radiologicalreports in multilingual settings. This research not only advances ourunderstanding of VLMs adaptability in healthcare but also points to significantavenues for future investigations into model tuning and language-specificadaptations.

 

Quick Read (beta)

loading the full paper ...