文档介绍:JSS Journal of Statistical Software
January 2007, Volume 18, Issue 6. /
Model-based Methods of Classification: Using the
mclust Software in Chemometrics
Chris Fraley Adrian E. Raftery
University of Washington University of Washington
Abstract
Due to recent advances in methods and software for model-based clustering, and to
the interpretability of the results, clustering procedures based on probability models are
increasingly preferred over heuristic methods. The clustering process estimates a model
for the data that allows for overlapping clusters, producing a probabilistic clustering that
quantifies the uncertainty of observations belonging ponents of the mixture. The
resulting clustering model can also be used for some other important problems in multi-
variate analysis, including density estimation and discriminant analysis. Examples of the
use of model-based clustering and classification techniques in chemometric studies include
multivariate image analysis, ic resonance imaging, microarray image segmentation,
statistical process control, and food authenticity. We review model-based clustering and
related methods for density estimation and discriminant analysis, and show how the R
package mclust can be applied in each instance.
Keywords: model-based clustering, classification, density estimation, discriminant analysis, R,
mclust.
1. Introduction
Clustering and classification methods are among the most important techniques in multivari-
ate analysis. Due to recent advances in methods and software for model-based clustering, and
to the interpretability of the results, clustering procedures based on probability models are
increasingly preferred over heuristic methods. Finite mixture models (McLachlan and Peel
2000) provide a principled statistical approach to clustering. ponent probability
corresponds to a cluster, and models that differ in the number ponents and/po-
nent distributions can pared using statistical criteria. Th