ECAMP: Entity-centered Context-aware Medical Vision Language Pre-training

  • 2024-03-19 12:01:35
  • Rongsheng Wang, Qingsong Yao, Haoran Lai, Zhiyang He, Xiaodong Tao, Zihang Jiang, S. Kevin Zhou
  • 0

Abstract

Despite significant advancements in medical vision-language pre-training,existing methods have largely overlooked the inherent entity-specific contextwithin radiology reports and the complex cross-modality contextualrelationships between text and images. To close this gap, we propose a novelEntity-centered Context-aware Medical Vision-language Pre-training (ECAMP)framework, which is designed to enable a more entity-centered andcontext-sensitive interpretation of medical data. Utilizing the recent powerfullarge language model, we distill entity-centered context from medical reports,which enables ECAMP to gain more effective supervision from the text modality.By further pre-training our model with carefully designed entity-aware,context-enhanced masked language modeling and context-guided super-resolutiontasks, ECAMP significantly refines the interplay between text and imagemodalities, leading to an enhanced ability to extract entity-centeredcontextual features. Besides, our proposed multi-scale context fusion designalso improves the semantic integration of both coarse and fine-level imagerepresentations, prompting better performance for multi-scale downstreamapplications. Combining these components leads to significant performance leapsover current state-of-the-art methods and establishes a new standard forcross-modality learning in medical imaging, whose effectiveness is demonstratedby our extensive experiments on various tasks including classification,segmentation, and detection across several public datasets. Code and models areavailable at https://github.com/ToniChopp/ECAMP.

 

Quick Read (beta)

loading the full paper ...