1 / 31
文档名称:

CART=Classification and Regression Trees:分类和回归树.ppt

格式:ppt   页数:31页
下载后只包含 1 个 PPT 格式的文档,没有任何的图纸或源代码,查看文件列表

如果您已付费下载过本站文档,您可以点这里二次下载

分享

预览

CART=Classification and Regression Trees:分类和回归树.ppt

上传人:薄荷牛奶 2016/1/31 文件大小:0 KB

下载得到文件列表

CART=Classification and Regression Trees:分类和回归树.ppt

文档介绍

文档介绍:CART:Classification and Regression Trees?Presented by;Pavla SmetanovaLütfiye ArslanStefan Lhachimi?Based on the book “Classification and Regression Trees”?by L. Breiman, J. Friedman, R. Olshen, andC. Stone (1984).Outline1- INTRODUCTION?What is CART??An example?Terminology?Strengths2- METHOD:3 steps in CART:?Tree building?Pruning?The final treeWhat is CART??A non-parametric technique,using the methodology of tree building.?Classifies objects or predicts es by selecting from a large number of variables the most important ones in determining the e variable.?CART analysis is a form of binary recursive example from Clinical research?Development of a reliable clinical decision rule to classify new patients into categories?19 measurements(age, bloodpressure, etc.)are taken from each heart-attack patients during the first 24 hours of their admittance to San Diego Hospital.?The goal: identify high-risk patientsClassification of Patients as High or No risk groups Is the minimum systolic blod pressure over the initial 24 hour> 91? yesnoIs age>?yes noIs sinus tachycardiapresent?yesnoGFGFTerminology?Theclassificationproblem: A systematic way of predicting the class of an object based on measurements.?C={1,...,J}: classes?x: measurement vector ?d(x): a classifying function assigning every x to one of the classes 1,...,??ss: split?learning sample (L): measurement data on Ncases observed in the past together with their actual classification.?R*(d):true misclassification rate R*(d)=P(d(x)=Y), Y? CStrengths?No distributional assumptions are required.?No assumption of homogeneity.?The explanatory variables can be a mixture of categorical, interval and continuous.?Especially good for high-dimensional and large data sets. Produce useful results by using a few important ?Sophisticated methods for dealing with missing variab