文档介绍:河北大学
硕士学位论文
基于学习的数据流TOP-N查询处理
姓名:曹智强
申请学位级别:硕士
专业:计算机应用技术
指导教师:朱亮
2011-05
摘要
摘要
通过对数据流的研究,我们知道数据流具有实时性、持续性、广泛性、语义不定性
等特性。本文首先总结了传统技术的优劣,如:直方图方法、抽样方法、哈希方法、小
波方法等。在此基础上并根据数据流的特性,本文提出了使用基于时间滑动窗口模型的
方法改构建概要数据库的方法,该方法很好的克服了传统技术在处理数据流问题上的局
限,从而为使用基于学习的 TOP-N 查询解决数据流问题提供了可能。然后,本文分析
了传统的 TOP-N 选择查询的优劣,并在此基础上提出了基于学习的 TOP-N 查询的方法。
该方法首先需要建立一个知识库,用来存储查询简档。在知识库建立完成之后,直接对
知识库进行检索即可。检索知识库时,需要首先计算出区域分布密度ρ,然后根据区域
分布密度ρ计算出查询半径 r,从而可以近似地得到符合要求的 N 个查询结果。当有一
批新近的数据到达时,还需要运用某种策略,分别对概要数据库和知识库进行更新、维
护。
关键词数据流滑动窗口模型概要数据库 TOP-N查询知识库
I
Abstract
Abstract
The study of data stream, we know that data stream with real-time, continuity,
universality, semantic uncertainty and other features. This paper summarizes the advantages
and disadvantages of the traditional data stream processing techniques, such as: histogram
method, sampling method, the hash method and Wavelet method. Based on these method and
according to the characteristics of data stream, this paper proposes the use of based on sliding
window model constructed the summary database, this method e the limitations of
the traditional technique dealing with data stream problem. Make use TOP-N query based on
learning solve the data stream problems provide possible. Then, the paper summarizes the
advantages and disadvantages of the traditional TOP-N query. Based on these method , the
paper proposes a TOP-N query based on learning. The method first need to construct a
knowledge database to store the query profile, after the Knowledge database Established,
search the knowledge database directly. Knowledge database in the search, you must calculate
the density ρ, then can get the query radius r by the density ρ,thus we can get the
requirements of the N results approximate. When a new batch of data into the summary
database, we needed to use some strategy update the knowledge database and the summary
database.
Keywords data stream sliding window model summary database TOP-