MIMIC-CXR-JPG, a large publicly available database of labeled chest radiographs

Abstract

Chest radiography is an extremely powerful imaging modality, allowing for adetailed inspection of a patient's thorax, but requiring specialized trainingfor proper interpretation. With the advent of high performance general purposecomputer vision algorithms, the accurate automated analysis of chestradiographs is becoming increasingly of interest to researchers. However, a keychallenge in the development of these techniques is the lack of sufficientdata. Here we describe MIMIC-CXR-JPG v2.0.0, a large dataset of 377,110 chestx-rays associated with 227,827 imaging studies sourced from the Beth IsraelDeaconess Medical Center between 2011 - 2016. Images are provided with 14labels derived from two natural language processing tools applied to thecorresponding free-text radiology reports. MIMIC-CXR-JPG is derived entirelyfrom the MIMIC-CXR database, and aims to provide a convenient processed versionof MIMIC-CXR, as well as to provide a standard reference for data splits andimage labels. All images have been de-identified to protect patient privacy.The dataset is made freely available to facilitate and encourage a wide rangeof research in medical computer vision.

Quick Read (beta)

loading the full paper ...