Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels

Abstract

ImageNet has been arguably the most popular image classification benchmark,but it is also the one with a significant level of label noise. Recent studieshave shown that many samples contain multiple classes, despite being assumed tobe a single-label benchmark. They have thus proposed to turn ImageNetevaluation into a multi-label task, with exhaustive multi-label annotations perimage. However, they have not fixed the training set, presumably because of aformidable annotation cost. We argue that the mismatch between single-labelannotations and effectively multi-label images is equally, if not more,problematic in the training setup, where random crops are applied. With thesingle-label annotations, a random crop of an image may contain an entirelydifferent object from the ground truth, introducing noisy or even incorrectsupervision during training. We thus re-label the ImageNet training set withmulti-labels. We address the annotation cost barrier by letting a strong imageclassifier, trained on an extra source of data, generate the multi-labels. Weutilize the pixel-wise multi-label predictions before the final pooling layer,in order to exploit the additional location-specific supervision signals.Training on the re-labeled samples results in improved model performancesacross the board. ResNet-50 attains the top-1 classification accuracy of 78.9%on ImageNet with our localized multi-labels, which can be further boosted to80.2% with the CutMix regularization. We show that the models trained withlocalized multi-labels also outperforms the baselines on transfer learning toobject detection and instance segmentation tasks, and various robustnessbenchmarks. The re-labeled ImageNet training set, pre-trained weights, and thesource code are available at {https://github.com/naver-ai/relabel_imagenet}.

Quick Read (beta)

loading the full paper ...