文档介绍:Graph-Based Hierarchical Conceptual Clustering
Istvan Jonyer ******@ ******@
Diane J. Cook ******@
Lawrence B. Holder ******@
Department puter Science and Engineering
University of Texas at Arlington
Arlington, TX 76019, USA
Abstract
Hierarchical conceptual clustering has proven to be a useful, although under-explored, data mining technique. A graph-based representation of structural bined with a substructure discovery technique has been shown to be essful in knowledge discovery. The SUBDUE substructure discovery system provides one bination of approaches. This work presents SUBDUE and the development of its clustering functionalities. Several examples are used to illustrate the validity of the approach both in structured and unstructured domains, as well as pare SUBDUE to the Cobweb clustering algorithm. We also develop a new metric paring structurally-defined clusterings. Results show that SUBDUE essfully discovers hierarchical clusterings in both structured and unstructured data.
Keywords: Clustering, Cluster Analysis, Concept Formation, Structural Data, Graph Match
Introduction
Data mining has e a prominent research area in recent years. One of the major reasons is the ever-increasing amount of data collected in diverse areas of the industrial and scientific world. Much of this data contains valuable knowledge that is not easily retrievable. The increasing speed and capacity puter technology has made feasible the utilization of various data mining techniques to automatically extract knowledge from this information. Such knowledge may take the form of predictive rules, clusters or hierarchies.
Beyond simple attributes of objects, many databases store structural information about relationships between objects. These structural databases provide a significant source of information for data mining. A well-publicized example is genome data, which is inherently structural (., DNA atoms bonded to other atoms) and