Abstract
ImageNet-1K serves as the primary dataset for pretraining deep learningmodels for computer vision tasks. ImageNet-21K dataset, which contains morepictures and classes, is used less frequently for pretraining, mainly due toits complexity, and underestimation of its added value compared to standardImageNet-1K pretraining. This paper aims to close this gap, and makehigh-quality efficient pretraining on ImageNet-21K available for everyone. %Via a dedicated preprocessing stage, utilizing WordNet hierarchies, and a noveltraining scheme called semantic softmax, we show that various models, includingsmall mobile-oriented models, significantly benefit from ImageNet-21Kpretraining on numerous datasets and tasks. We also show that we outperformprevious ImageNet-21K pretraining schemes for prominent new models like ViT. %Our proposed pretraining pipeline is efficient, accessible, and leads to SoTAreproducible results, from a publicly available dataset. The training code andpretrained models are available at: https://github.com/Alibaba-MIIL/ImageNet21K