文档介绍:Data Mining
Cluster Analysis: Basic Concepts
and Algorithms
Lecture Notes for Chapter 8
Introduction to Data Mining
by
Tan, Steinbach, Kumar
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1
What is Cluster Analysis?
O Finding groups of objects such that the objects in a group
will be similar (or related) to one another and different
from (or unrelated to) the objects in other groups
Inter-cluster
Intra-cluster distances are
distances are maximized
minimized
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 2
Applications of Cluster Analysis
Discovered Clusters Industry Group
O Understanding Applied-Matl-DOWN,work-Down,-DOWN,
Cabletron-Sys-DOWN,CISCO-DOWN,HP-DOWN,
1 m-DOWN,INTEL-DOWN,LSI-Logic-DOWN, Technology1-DOWN
– Group related documents Micron-Tech-DOWN,Texas-Inst-Down,Tellabs-Inc-Down,
Natl-Semiconduct-DOWN,Oracl-DOWN,SGI-DOWN,
for browsing, group genes Sun-DOWN
p-DOWN,Autodesk-DOWN,DEC-DOWN,
ADV-Micro-Device-DOWN,Andrew-Corp-DOWN,
and proteins that have puter-Assoc-DOWN,Circuit-City-DOWN,
Compaq-DOWN, EMC-Corp-DOWN, Gen-Inst-DOWN, Technology2-DOWN
similar functionality, or Motorola-DOWN,Microsoft-DOWN,Scientific-Atl-DOWN
Fannie-Mae-DOWN,Fed-Home-Loan-DOWN,
group stocks with similar 3 MBNA-Corp-DOWN,Morgan-Stanley-DOWN Financial-DOWN
price fluctuations Baker-Hughes-UP,Dresser-Inds-UP,Halliburton-HLD-UP,
Louisiana-Land-UP,Phillips-Petro-UP,Unocal-UP, Oil-UP
4 Schlumberger-UP
O Summarization
– Reduce the size of large
data sets
Clustering precipitation
in Australia
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 3
What is not Cluster Analysis?
O Supervised classification
– Have class label information
O Simple segmentation
– Dividing students into different registration groups
alphabetically, by last name
O Results of a query
– Groupings are a result of an external specification
O Graph partitioning
–