文档介绍:摘要
摘要
近年来,随着科学技术的跨越式发展,计算机应用技术也有了长足的进步和更加普及,同时,需要进行分析处理的数据也越来越多。数据流作为一种愈发普及的数据来源,受到了研究人员很大的关注,数据流上的聚类分析已经成为了一个十分热门的课题。
与传统的数据库聚类不同,数据流上的聚类具有很明显的特征:单遍扫描、存储空间受限、对结果要求实时性。因此,传统的聚类算法就不能很好的应用在数据流聚类框架上,必须设计出新的基于数据流的聚类算法以满足要求。
本文主要深入研究了Attinity Propagation(AP)聚类算法,以及基于AP算法的数据流聚类算法,并在此基础上提出了一种新的基于AP算法的数据流聚类
算法——-NewSTRAP聚类算法。首先通过AP聚类算法形成初始的聚类,得到相关信息。然后对于每条新到达的数据,将其划分到与之相似度最大的聚类中, 并计算对聚类精确度产生的影响。当聚类精确度不能满足要求时,就进行一次
大的调整,以提高聚类精确度。最后,文章还通过实验对STRAP聚类算法和 NewSTRAP聚类算法在聚类精确度和时间性能两个方面进行了比较,得出 NewSTRAP算法在聚类精确度方面高于STRAP算法,在时间性能上,STRAP 算法稍微好于NewSTR AP算法。
本文还对数据流聚类的应用进行了初步的研究,分别在入侵检测和设备选址两个场景中进行了讨论。
关键字:数据流聚类算法AP聚类算法
Abstract
Abstract
In recent years,with the rapid development puter application technology and ever-popular,people’S ability to acquire the data has been the rapid all mon source of data,data stream is paid a lot of stream based cluster analysis has e a popular topic.
Different from traditional database clustering,data stream clustering has many characteristics:single-path scan of data,limited memory space and clustering results of real—lime the traditional clustering algorithm Call not be a good
application in the data stream,we must design new data stream based algorithms to
meet the requirements.
This paper mainly deeply studies the Amnity Propagation(AP)clustering algorithm and AP-based data stream clustering paper also proposes a new AP-based data stream clustering use AP algorithm to generate the original clusters,obtaining the cluster then,for each new data,
divide it into the cluster which has the biggest similarity、析t11 it,and calculate the impact of accuracy of clustering the clustering accuracy Can not meet the requirement,make a global adjustment to improve the clustering , the pare STRAP clustering algorithm and NewSTRAP clustering algorithm in clustering accuracy and time performance,and the conclusion is that,in the aspect of c