Terabyte-scale Deep Multiple Instance Learning for Classification and Localization in Pathology

  • 2018-05-17 22:43:46
  • Gabriele Campanella, Vitor Werneck Krauss Silva, Thomas J. Fuchs
  • 47

Abstract

In the field of computational pathology, the use of decision support systemspowered by state-of-the-art deep learning solutions has been hampered by thelack of large labeled datasets. Until recently, studies relied on datasets inthe order of few hundreds of slides which are not enough to train a model thatcan work at scale in the clinic. Here, we have gathered a dataset consisting of12,160 slides, two orders of magnitude larger than previous datasets inpathology and equivalent to 25 times the pixel count of the entire ImageNetdataset. Given the size of our dataset it is possible for us to train a deeplearning model under the Multiple Instance Learning (MIL) assumption where onlythe overall slide diagnosis is necessary for training, avoiding all theexpensive pixel-wise annotations that are usually part of supervised learningapproaches. We test our framework on a complex task, that of prostate cancerdiagnosis on needle biopsies. We performed a thorough evaluation of theperformance of our MIL pipeline under several conditions achieving an AUC of0.98 on a held-out test set of 1,824 slides. These results open the way fortraining accurate diagnosis prediction models at scale, laying the foundationfor decision support system deployment in the clinic.

 

Quick Read (beta)

loading the full paper ...