文档介绍:Using Emerging Patterns to Analyze Gene Expression Data
Jinyan Li
puting Group
Knowledge & Discovery Program
Laboratories for Information Technology
Singapore
Outline
Introduction
Brief history of decision trees and emerging patterns
Basic ideas for decision trees and EPs
Advanced parisons using gene expression data
Summary
Introduction
Decision trees and emerging patterns both classification rules
Sharp discrimination power (no or little uncertainty)
Advantage over black-box learning models
Decision trees not the best (on accuracy)
EP-based classifiers: competitive to the best
Brief History of Decision Trees
CLS (Hunt etal. 1966)--- cost driven
ID3 (Quinlan, 1986 MLJ) --- Information-driven
(Quinlan, 1993) --- Pruning ideas
CART (Breiman et al. 1984) --- Gini Index
Brief history of emerging patterns
General EP (Dong & Li, 1999 Sigkdd)
CAEP (Dong etal, DS99), JEP-C (Li etal, KAIS00)
EP-space (Li etal, ICML00), DeEPs (Li etal, MLJ)
PCL (Li & Wong, ECML02)
Basic definitions
Relational data
Attributes (color, gene_x), attribute values (red, ), attribute-value pair (equivalently, condition, item)
Patterns, instances
Training data, test data
A simple dataset
Outlook Temp Humidity Windy class
Sunny 75 70 true Play
Sunny 80 90 true Don’t
Sunny 85 85 false Don’t
Sunny 72 95 true Don’t
Sunny 69 70 false Play
Overcast 72 90 true Play
Overcast 83 78 false Play
Overcast 64 65 true Play
Overcast 81 75 false Play
Rain 71 80 true Don’t
Rain 65 70 true Don’t
Rain 75 80 false Play
Rain 68 80 false Play
Rain 70 96 false Play
9 Play samples
5 Don’t
A total of 14.
A decision tree
outlook
windy
humidity
Play
Play
Play
Don’t
Don’t
sunny
overcast
rain
<= 75
> 75
false
true
2
4
3
2
plete problem
3
A heuristic
Using information gain to select the most discriminatory feature (for tree and sub-trees)
Recursive subdivision over the original training data
Characteristics of trees
Single coverage of training data (elegance)
Div