ECAMP: Entity-centered Context-aware Medical Vision Language Pre-training

  • 2024-03-19 12:01:35
  • Rongsheng Wang, Qingsong Yao, Haoran Lai, Zhiyang He, Xiaodong Tao, Zihang Jiang, S. Kevin Zhou
  • 0


Despite significant advancements in medical vision-language pre-training,existing methods have largely overlooked the inherent entity-specific contextwithin radiology reports and the complex cross-modality contextualrelationships between text and images. To close this gap, we propose a novelEntity-centered Context-aware Medical Vision-language Pre-training (ECAMP)framework, which is designed to enable a more entity-centered andcontext-sensitive interpretation of medical data. Utilizing the recent powerfullarge language model, we distill entity-centered context from medical reports,which enables ECAMP to gain more effective supervision from the text modality.By further pre-training our model with carefully designed entity-aware,context-enhanced masked language modeling and context-guided super-resolutiontasks, ECAMP significantly refines the interplay between text and imagemodalities, leading to an enhanced ability to extract entity-centeredcontextual features. Besides, our proposed multi-scale context fusion designalso improves the semantic integration of both coarse and fine-level imagerepresentations, prompting better performance for multi-scale downstreamapplications. Combining these components leads to significant performance leapsover current state-of-the-art methods and establishes a new standard forcross-modality learning in medical imaging, whose effectiveness is demonstratedby our extensive experiments on various tasks including classification,segmentation, and detection across several public datasets. Code and models areavailable at


Quick Read (beta)

loading the full paper ...