On the Resistance of Nearest Neighbor to Random Noisy Labels

Abstract

Nearest neighbor has always been one of the most appealing non-parametricapproaches in machine learning, pattern recognition, computer vision, etc.Previous empirical studies partly shows that nearest neighbor is resistant tonoise, yet there is a lack of deep analysis. This work presents thefinite-sample and distribution-dependent bounds on the consistency of nearestneighbor in the random noise setting. The theoretical results show that, forasymmetric noises, k-nearest neighbor is robust enough to classify most datacorrectly, except for a handful of examples, whose labels are totally misled byrandom noises. For symmetric noises, however, k-nearest neighbor achieves thesame consistent rate as that of noise-free setting, which verifies theresistance of k-nearest neighbor to random noisy labels. Motivated by thetheoretical analysis, we propose the Robust k-Nearest Neighbor (RkNN) approachto deal with noisy labels. The basic idea is to make unilateral corrections toexamples, whose labels are totally misled by random noises, and classify theothers directly by utilizing the robustness of k-nearest neighbor. We verifythe effectiveness of the proposed algorithm both theoretically and empirically.

Quick Read (beta)

loading the full paper ...