文档介绍：Chapter 4
Primitives for Data Mining
7/28/2018
1
Why data mining primitives?
Can we hope a data mining system autonomously mine out all of the valuable knowledge?
such a system may generate an overwhelmingly large set of patterns
most of the mined patterns may be irrelevant to the analysis task
many of the mined patterns, although related to the analysis task, may be difficult to understand, or lack of validity, novelty, or utility
A data mining query language that incorporates necessary primitives can help users flexibly interact with the data mining system
7/28/2018
2
What defines a data mining task?
What is the data set that you want to mine?
What kind of knowledge do you want to mine?
What background knowledge could be useful?
Which measurements can be used to estimate the interestingness of patterns?
How to present the discovered patterns?
to development or use a data mining query language,
you must know what defines a data mining task
7/28/2018
3
Task-relevant data
this is the database portion to be investigated
database or data warehouse name
database tables or data warehouse cube
conditions for data selection
relevant attributes or dimensions
data grouping criteria
data collection process results in a new data relation, called initial data relation
the initial data relation may or may not correspond to a physical relation in the database
the portion of database to be mined is called a minable view
7/28/2018
4
Types of knowledge to be mined
Characterization
Discrimination
Association
Classification/prediction
Clustering
Outlier analysis
……
in addition to specifying the type of knowledge to be mined, the user can be more specific and provide pattern templates that all discovered patterns must match
7/28/2018
5
Background knowledge (I)
Four types of concept hierarchies:
schema hierarchy
a total or partial order among attributes in the database schema
. city < province_or_state < country
set-grouping hierarchy
a total or pa