文档介绍:8
Cluster Analysis:
Basic Concepts and
Algorithms
Cluster analysis divides data into groups (clusters) that are meaningful, useful,
or both. If meaningful groups are the goal, then the clusters should capture the
natural structure of the data. In some cases, however, cluster analysis is only a
useful starting point for other purposes, such as data summarization. Whether
for understanding or utility, cluster analysis has long played an important
role in a wide variety of fields: psychology and other social sciences, biology,
statistics, pattern recognition, information retrieval, machine learning, and
data mining.
There have been many applications of cluster analysis to practical prob-
lems. We provide some specific examples, organized by whether the purpose
of the clustering is understanding or utility.
Clustering for Understanding Classes, or conceptually meaningful groups
of objects that mon characteristics, play an important role in how
people analyze and describe the world. Indeed, human beings are skilled at
dividing objects into groups (clustering) and assigning particular objects to
these groups (classification). For example, even relatively young children can
quickly label the objects in a photograph as buildings, vehicles, people, ani-
mals, plants, etc. In the context of understanding data, clusters are potential
classes and cluster analysis is the study of techniques for automatically finding
classes. The following are some examples:
488 Chapter 8 Cluster Analysis: Basic Concepts and Algorithms
• Biology. Biologists have spent many years creating a taxonomy (hi-
erarchical classification) of all living things: kingdom, phylum, class,
order, family, genus, and species. Thus, it is perhaps not surprising that
much of the early work in cluster analysis sought to create a discipline
of mathematical taxonomy that could automatically find such classifi-
cation structures. More recently, biologists have applied clustering to
analy