文档介绍:Online edition (c)2009 Cambridge UP Online edition (c)2009 Cambridge UP DRAFT! ? April 1, 2009 Cambridge University Press. Feedback e. 377 17 Hierarchical clustering Flat clustering is ef?cient and conceptually simple, but aswe saw in Chap- ter16it has a number of drawbacks. The algorithms introduced in Chap- ter16return a ?at unstructured set of clusters, require a prespeci?ed num- ber of clusters as input and are clustering(or HIERARCHICAL CLUSTERING hierarchic clustering) outputs a hierarchy, a structure that is more informative thanthe unstructuredset of clusters returnedby ?at clustering. 1Hierarchical clustering does not require us to prespecify the number of clusters and most hierarchical algorithms that have beenusedin IRare deterministic. These ad- vantages of hierarchical e at the cost of lower ef?ciency. The mon hierarchical clustering algorithms have plexity that is at least quadratic in the number of pared to the plex- ity ofK-means and EM(cf. Section , page364). This chapter ?rst introducesagglomerativehierarchical clustering (Section ) and presents four different agglomerative algorithms, in –, which differ in the similarity measures they employ: single-link, complete- link, group-average, and centroid similarity. We then discuss the optimality conditions of hierarchical clustering in . top-down (ordivisive) hierarchical clustering. Section at labeling clusters automatically, a problemthat must be solved whenever humans in- teract with the output of clustering. We discuss implementation issues in Section . pointers to further reading, including ref- erences to soft hierarchical clustering, which we do not cover in this book. There are few differences between the applications of ?at and hierarchi- cal clustering in information retrieval. In particular, hierarchical clustering is appropriate for any of the applications shown in Tabl