Identifying noisy labels with a transductive semi-supervised leave-one-out filter

Abstract

Obtaining data with meaningful labels is often costly and error-prone. Inthis situation, semi-supervised learning (SSL) approaches are interesting, asthey leverage assumptions about the unlabeled data to make up for the limitedamount of labels. However, in real-world situations, we cannot assume that thelabeling process is infallible, and the accuracy of many SSL classifiersdecreases significantly in the presence of label noise. In this work, weintroduce the LGC_LVOF, a leave-one-out filtering approach based on the Localand Global Consistency (LGC) algorithm. Our method aims to detect and removewrong labels, and thus can be used as a preprocessing step to any SSLclassifier. Given the propagation matrix, detecting noisy labels takes O(cl)per step, with c the number of classes and l the number of labels. Moreover,one does not need to compute the whole propagation matrix, but only an $l$ by$l$ submatrix corresponding to interactions between labeled instances. As aresult, our approach is best suited to datasets with a large amount ofunlabeled data but not many labels. Results are provided for a number ofdatasets, including MNIST and ISOLET. LGCLVOF appears to be equally or moreprecise than the adapted gradient-based filter. We show that the best-caseaccuracy of the embedding of LGCLVOF into LGC yields performance comparable tothe best-case of $\ell_1$-based classifiers designed to be robust to labelnoise. We provide a heuristic to choose the number of removed instances.

Quick Read (beta)

loading the full paper ...