文档介绍:基于商品分类信息的关联规则聚类
阮备军朱扬勇
(复旦大学计算机与信息技术系,上海 200433)
(E-mail: ******@.cn)
摘要关联规则挖掘经常产生大量的规则,为了帮助用户作探索式分析,需要对规则进行有效的组织。聚类是一种有效的组织方法。已有的规则聚类方法在计算规则间距离时都需要扫描原始数据集,效率很低,而且聚类结果是固定数目的簇,不利于探索式分析。针对这些问题,提出了一种新的方法。它基于商品分类信息度量规则间的距离,避免了耗时的原始数据集扫描;然后用OPTICS聚类算法产生便于探索式分析的聚类结构。最后用某个零售业公司的实际交易数据作了实验,并通过可视化工具演示了聚类效果。实验结果表明此方法是实用有效的。
关键词数据挖掘,关联规则,聚类,可视化
中图法分类号 TP311
Association Rule Clustering Based on Taxonomy Information
RUAN Bei-jun, ZHU Yang-yong
(Department puting and Information Technology, Fudan University, ShangHai 200433, China)
Abstract Association rule mining often produces a large number of rules. To facilitate exploratory analysis structuring rules is needed. A useful method for structuring rules is clustering. All of existing methods for clustering rules suffer from the costly scan of the original dataset for determining the distances between rules. Moreover, the result of these methods is a fixed number of clusters that makes exploratory analysis difficult. A new method is proposed to e these problems. Taxonomy information is used to measure the distances between rules and the expensive scan of the original dataset is avoided. A Clustering algorithm, OPTICS, is applied to generate the clustering structure suitable for exploratory analysis. Finally, an experiment is conducted on a real-life dataset and the experimental result is presented via a visualization tool, showing