文档介绍:: .
Chapter8
ach: (i) the set of labeled objects to be used for evaluating
a test object’s class,1 (ii) a distance or similarity metric that can be used to compute
1This need not be the entire training set.
151
© 2009 by Taylor & Francis Group, LLC152 kNN: k-Nearest Neighbors
the closeness of objects, (iii) the value of k, the number of nearest neighbors, and (iv)
the method used to determine the class of the target object based on the classes and
distances of the k nearest neighbors. In its simplest form, kNN can involve assigning
an object the class of its nearest neighbor or of the majority of its nearest neighbors,
but a variety of enhancements are possible and are discussed below.
More generally, kNN is a special case of instance-based learning [1]. This includes
case-based reasoning [3], which deals with symbolic data. The kNN approach is also
an example of a lazy learning technique, that is, a technique that waits until the query
arrives to generalize beyond the training data.
Although kNN classification is a classification technique that is easy to understand
and implement, it performs well in many situations. In particular, a well-known result
by Cover and Hart [6] shows that the classification error2 of the nearest neighbor rule
is bounded above by twice the optimal Bayes error3 under certain reasonable assump-
tions. Furthermore, the error of the general kNN method asymptotically approaches
that of the Bayes error and can be used to approximate it.
Also, because of its simplicity, kNN is easy to modify for more complicated classifi-
cation problems. For instance, kNN is particularly well-suited for multimodal classes4
as well as applications in which an object can have many class labels. To illustrate the
last point, for the ass