Leveraging Prompt-Learning for Structured Information Extraction from Crohn's Disease Radiology Reports in a Low-Resource Language

  • 2024-05-02 20:11:54
  • Liam Hazan, Gili Focht, Naama Gavrielov, Roi Reichart, Talar Hagopian, Mary-Louise C. Greer, Ruth Cytter Kuint, Dan Turner, Moti Freiman
Automatic conversion of free-text radiology reports into structured datausing Natural Language Processing (NLP) techniques is crucial for analyzingdiseases on a large scale. While effective for tasks in widely spoken languageslike English, generative large language models (LLMs) typically underperformwith less common languages and can pose potential risks to patient privacy.Fine-tuning local NLP models is hindered by the skewed nature of real-worldmedical datasets, where rare findings represent a significant data imbalance.We introduce SMP-BERT, a novel prompt learning method that leverages thestructured nature of reports to overcome these challenges. In our studiesinvolving a substantial collection of Crohn's disease radiology reports inHebrew (over 8,000 patients and 10,000 reports), SMP-BERT greatly surpassedtraditional fine-tuning methods in performance, notably in detecting infrequentconditions (AUC: 0.99 vs 0.94, F1: 0.84 vs 0.34). SMP-BERT empowers moreaccurate AI diagnostics available for low-resource languages.


