Abstract
The de-identification (deID) of protected health information (PHI) andpersonally identifiable information (PII) is a fundamental requirement forsharing medical images, particularly through public repositories, to ensurecompliance with patient privacy laws. In addition, preservation of non-PHImetadata to inform and enable downstream development of imaging artificialintelligence (AI) is an important consideration in biomedical research. Thegoal of MIDI-B was to provide a standardized platform for benchmarking of DICOMimage deID tools based on a set of rules conformant to the HIPAA Safe Harborregulation, the DICOM Attribute Confidentiality Profiles, and best practices inpreservation of research-critical metadata, as defined by The Cancer ImagingArchive (TCIA). The challenge employed a large, diverse, multi-center, andmulti-modality set of real de-identified radiology images with syntheticPHI/PII inserted. The MIDI-B Challenge consisted of three phases: training, validation, andtest. Eighty individuals registered for the challenge. In the training phase,we encouraged participants to tune their algorithms using their in-house orpublic data. The validation and test phases utilized the DICOM imagescontaining synthetic identifiers (of 216 and 322 subjects, respectively). Tenteams successfully completed the test phase of the challenge. To measuresuccess of a rule-based approach to image deID, scores were computed as thepercentage of correct actions from the total number of required actions. Thescores ranged from 97.91% to 99.93%. Participants employed a variety ofopen-source and proprietary tools with customized configurations, largelanguage models, and optical character recognition (OCR). In this paper weprovide a comprehensive report on the MIDI-B Challenge's design,implementation, results, and lessons learned.