Abstract
Pictures of everyday life are inherently multi-label in nature. Hence,multi-label classification is commonly used to analyze their content. Intypical multi-label datasets, each picture contains only a few positive labels,and many negative ones. This positive-negative imbalance can result inunder-emphasizing gradients from positive labels during training, leading topoor accuracy. In this paper, we introduce a novel asymmetric loss ("ASL"),that operates differently on positive and negative samples. The lossdynamically down-weights the importance of easy negative samples, causing theoptimization process to focus more on the positive samples, and also enables todiscard mislabeled negative samples. We demonstrate how ASL leads to a more"balanced" network, with increased average probabilities for positive samples,and show how this balanced network is translated to better mAP scores, comparedto commonly used losses. Furthermore, we offer a method that can dynamicallyadjust the level of asymmetry throughout the training. With ASL, we reach newstate-of-the-art results on three common multi-label datasets, includingachieving 86.6% on MS-COCO. We also demonstrate ASL applicability for othertasks such as fine-grain single-label classification and object detection. ASLis effective, easy to implement, and does not increase the training time orcomplexity. Implementation is available at:https://github.com/Alibaba-MIIL/ASL.