Evaluating and Improving the Effectiveness of Synthetic Chest X-Rays for Medical Image Analysis

Abstract

Purpose: To explore best-practice approaches for generating synthetic chestX-ray images and augmenting medical imaging datasets to optimize theperformance of deep learning models in downstream tasks like classification andsegmentation. Materials and Methods: We utilized a latent diffusion model tocondition the generation of synthetic chest X-rays on text prompts and/orsegmentation masks. We explored methods like using a proxy model and usingradiologist feedback to improve the quality of synthetic data. These syntheticimages were then generated from relevant disease information or geometricallytransformed segmentation masks and added to ground truth training set imagesfrom the CheXpert, CANDID-PTX, SIIM, and RSNA Pneumonia datasets to measureimprovements in classification and segmentation model performance on the testsets. F1 and Dice scores were used to evaluate classification and segmentationrespectively. One-tailed t-tests with Bonferroni correction assessed thestatistical significance of performance improvements with synthetic data.Results: Across all experiments, the synthetic data we generated resulted in amaximum mean classification F1 score improvement of 0.150453 (CI:0.099108-0.201798; P=0.0031) compared to using only real data. Forsegmentation, the maximum Dice score improvement was 0.14575 (CI:0.108267-0.183233; P=0.0064). Conclusion: Best practices for generatingsynthetic chest X-ray images for downstream tasks include conditioning onsingle-disease labels or geometrically transformed segmentation masks, as wellas potentially using proxy modeling for fine-tuning such generations.

Quick Read (beta)

loading the full paper ...