Abstract
Two DL models were developed using radiograph-level annotations (yes or nodisease) and fine-grained lesion-level annotations (lesion bounding boxes),respectively named CheXNet and CheXDet. The models' internal classificationperformance and lesion localization performance were compared on a testing set(n=2,922), external classification performance was compared on NIH-Google(n=4,376) and PadChest (n=24,536) datasets, and external lesion localizationperformance was compared on NIH-ChestX-ray14 dataset (n=880). The models werealso compared to radiologists on a subset of the internal testing set (n=496).Given sufficient training data, both models performed comparably toradiologists. CheXDet achieved significant improvement for externalclassification, such as in classifying fracture on NIH-Google (CheXDet areaunder the ROC curve [AUC]: 0.67, CheXNet AUC: 0.51; p<.001) and PadChest(CheXDet AUC: 0.78, CheXNet AUC: 0.55; p<.001). CheXDet achieved higher lesiondetection performance than CheXNet for most abnormalities on all datasets, suchas in detecting pneumothorax on the internal set (CheXDet jacknife alternativefree-response ROC-figure of merit [JAFROC-FOM]: 0.87, CheXNet JAFROC-FOM: 0.13;p<.001) and NIH-ChestX-ray14 (CheXDet JAFROC-FOM: 0.55, CheXNet JAFROC-FOM:0.04; p<.001). To summarize, fine-grained annotations overcame shortcutlearning and enabled DL models to identify correct lesion patterns, improvingthe models' generalizability.