文档介绍:Chapter 5
Data putation
7/28/2018
1
What is concept description?
Generate descriptions for characterization parison of the data
the simplest kind of descriptive data mining
sometimes called class description when the concept to be described refers to a class of objects
Characterization: provide a concise and inct summarization of the given collection of parison (discrimination): provide paring two or more collections of data
7/28/2018
2
Data generalization
both characterization and discrimination are based on data generalization and summarization
Data generalization
a process which abstracts a large set of task-relevant data in a database from a relatively low conceptual level to higher conceptual levels
Data generalization approaches:
data cube approach
attribute-oriented induction approach
7/28/2018
3
Data cube approach
The data for analysis are stored in a multidimensional database, or data cube
generalization and specialization can be performed on a data cube by roll-up and drill-down
this is not an approach for concept description, only for data generalization
Limitations:
mercial data cube implementations confine the types of dimensions to simple nonnumeric data and of measures to simple aggregated numeric values
concept hierarchies can be automatically generated from numeric data to form numeric dimensions, however, this is a result of recent data mining research and is not available in mercial systems
cannot tell which dimensions should be used and what levels should the generalization reach
7/28/2018
4
Is OLAP enough?
OLAP
restricted to certain kinds of attributes and measure types
user-controlled process
Concept description
can plex data types of the attributes and their aggregations
a more automated process
7/28/2018
5
Attribute-oriented induction
proposed in 1989
Y. Cai, N. Cercone, and J. Han, KDD Workshop at IJCAI-89
in its initial proposal, AOI is a relational database query-oriented, generalization-based, online data ana