文档介绍:基于粗糙分类度的决策树算法
吴明泉1,刘童璇1,陈晓伟1
(中国石油大学(华东)计算机与通信工程学院东营 257061)1
摘要在构造决策树的过程中,属性分裂标准直接影响分类的效果。本文针对ID3算法对属性分类精度强调不足问题,基于粗糙集理论提出了粗糙分类度的概念,将粗糙分类度作为选择分离属性的标准。该方法充分考虑了属性分类精度对分类结果造成的影响,兼顾了条件属性与决策属性的依赖性。经实验证明,相比传统的基于信息熵方法构造的决策树,有效的提高了分类的准确率。
关键词分类精度;属性相关程度;粗糙集;决策树;信息增益
中图分类号:TP182 文献标识码: A
An Algorithm for Decision Tree Construction Based on Degree of Rough Classification
ZHANG Qiong-sheng 1, WU Ming-quan 1,LIU Tong-xuan1, CHEN Xiao-wei1,
(College puter munication, China University of Petroleum, Dongying 257061, China)1
Abstract In the process of decision tree construction, property division standards directly affect the classification results. Aimed at weakness of ID3 in nicety of grading, we provide the concept of degree of rough classification as select criteria of separation of property. The method took into account nicety of grading and dependency between condition attributes and decision attributes. Compared with traditional decision tree based entropy, the experiment proved that the decision tree constructed in our method effectively improves the classification results.
Keywords Classification Accuracy; Attribute Relevance; Rough Set; Decision Tree; Information Gain
1 引言
决策树学****是以示例学****为基础的归纳推理算法,着眼于从一组无次序、无规则的事例推出决策树表示形式的规则。在解决分类问题的各种方法中,决策树方法是运用最广泛的一种,它采用自顶向下、分而治之的方法将搜索空间分为若干个互不相交的子集,形成一种类似于流程图的树状结构,这种方法速度快、易于转换成简单而便于理解的分类规则。ID3[2]算法是一种基于信息熵的决策树学****算法,是决策树算法的代表,但是基于信息熵的方法只