文档介绍:
(Decision Tree)
北邮经管学院 张晓航
Contents
Basic Concepts
CHAID
CART
Tree in SAS EM
数据挖掘实验 北邮经管学院张晓航 2
Fitted Decision Tree
BAD =
New Case
DEBTINC = 20 DEBTINC
NINQ = 2 <45 45
DELINQ = 0
DELINQ
21%
0 1-2 >2
NINQ
45%
2% 0,1 >1
45%
10% 75%
数据挖掘实验 北邮经管学院张晓航 3
Divide and Conquer
n = 5,000
10% BAD
yes no
Debt-to-Income
n = 4,300 Ratio < 45 n = 700
1% BAD % BAD
数据挖掘实验 北邮经管学院张晓航 4
The Cultivation of Trees
Split Search
Which splits are to be considered?
Splitting Criterion
Which split is best?
Stopping Rule
When should the splitting stop?
Pruning Rule
Should some branches be lopped off?
数据挖掘实验 北邮经管学院张晓航 5
Possible Splits to Consider
500,000
400,000 Nominal
Input
Ordinal
300,000
Input
200,000
100,000
1
2 4 6 8 10 12 14 16 18 20
Input Levels
数据挖掘实验 北邮经管学院张晓航 6
Splitting Criteria
Left Right
Good 2700 1800 4500 Worthless Split
Bad