Finding Relevant Points for Nearest-Neighbor Classification

Abstract

In nearest-neighbor classification problems, a set of $d$-dimensionaltraining points are given, each with a known classification, and are used toinfer unknown classifications of other points by using the same classificationas the nearest training point. A training point is relevant if its omissionfrom the training set would change the outcome of some of these inferences. Weprovide a simple algorithm for thinning a training set down to its subset ofrelevant points, using as subroutines algorithms for finding the minimumspanning tree of a set of points and for finding the extreme points (convexhull vertices) of a set of points. The time bounds for our algorithm, in anyconstant dimension $d\ge 3$, improve on a previous algorithm for the sameproblem by Clarkson (FOCS 1994).

Quick Read (beta)

loading the full paper ...