文档介绍:河北大学
硕士学位论文
基于知识库的中文关键词top-N关系查询处理
姓名:竹勇
申请学位级别:硕士
专业:计算机应用技术
指导教师:朱亮
2011-05
摘要
摘要
鉴于关键词查询在 IR 和 Web 中检索文档的理论和技术方法,在关系数据库中支持
自由态的关键词查询成为一个活跃的研究课题。关键词查询能够更好地适应 Web 数据库,
不需要用户知道数据库模式和 SQL 等信息,其处理算法和排序方法都需要精心研究设
计。目前,关键词查询的研究主要是针对英文文本属性,本文研究中文文本属性的关键
词 top-N 查询,不但实现了 IR-Style 检索功能,而且能够处理中文的缩略词,实现了按
字检索,避免了分词问题。
由于中英文存在很大差异,无法将英文关键词查询技术直接应用于中文。本文给出
的方法主旨如下:创建知识库存储数据库中中文文本属性和元组字的相关信息,并且运
用此知识库建立索引,实现快速查询处理。对于一个关键字查询,通过索引,逐一匹配
查询字和元组字,得到候选元组标识的集合。根据查询字和元组字匹配情况和知识库存
储的相关信息,给出排序方法。用此方法对候选元组的标识进行排序,检索出候选元组
的集合。对候选元组集合,进行查询短语匹配,以提高查询的准确率。最后返回 top-N
结果。本文建立了原型系统,在实际数据集上的实验结果表明所给出的方法在查询时间
和准确性上是有效的。
关键词关系数据库中文关键词排序策略知识库 top-N 查询
I
Abstract
Abstract
Inspired by the theory and technology in the field of Information Retrieval and Web
Search Engine, free-style keyword search over relational database have been the new research
focus of information processing. The methods of keyword search are suitable to Web database,
and they need not users to know the database schema and the structured query language. The
sorting methods and processing algorithms are key issues in keyword search. The existed
keyword research technologies usually focus on English keyword search, in this thesis we will
discuss Chinese keyword search in a relational database. We propose a new method to realize
free-style Chinese keyword search over relational databases, and to avoid the problem of
Chinese Word Segmentation and improved the searching accuracy.
There are many differences between English and Chinese about grammar and basic
information unit. It’s difficult to directly apply the technologies of English keyword search to
the Chinese keyword search. The main ideas of the method in this thesis are below: Firstly, a
knowledge-base is built to store the related information of attributes and tuple word in a
database, then a index will be created based on the knowledge-base, which be used to match
the query word and the tuple word one by