Abstract
Artificial Intelligence is revolutionizing medical practice, enhancingdiagnostic accuracy and healthcare delivery. However, its adaptation in medicalsettings still faces significant challenges, related to data availability andprivacy constraints. Synthetic data has emerged as a promising solution tomitigate these issues, addressing data scarcity while preserving privacy.Recently, Latent Diffusion Models have emerged as a powerful tool forgenerating high-quality synthetic data. Meanwhile, the integration of differentmodalities has gained interest, emphasizing the need of models capable ofhandle multimodal medical data. Existing approaches struggle to integratecomplementary information and lack the ability to generate modalitiessimultaneously. To address this challenge, we present MedCoDi-M, a6.77-billion-parameter model, designed for multimodal medical data generation,that, following Foundation Model paradigm, exploits contrastive learning andlarge quantity of data to build a shared latent space which capture therelationships between different data modalities. Further, we introduce theMulti-Prompt training technique, which significantly boosts MedCoDi-M'sgeneration under different settings. We extensively validate MedCoDi-M: firstwe benchmark it against five competitors on the MIMIC-CXR dataset, astate-of-the-art dataset for Chest X-ray and radiological report generation.Secondly, we perform a Visual Turing Test with expert radiologists to assessthe realism and clinical relevance of the generated data, ensuring alignmentwith real-world scenarios. Finally, we assess the utility of MedCoDi-M inaddressing key challenges in the medical field, such as anonymization, datascarcity and imbalance learning. The results are promising, demonstrating theapplicability of MedCoDi-M in medical contexts. Project page is athttps://cosbidev.github.io/MedCoDi-M/.