文档介绍：HEBEI UNIVERSITY

密级:
分类号:
学校代码:10075
学号:20101179
硕士学位论文

基于多表数据库的中文关键词
Top-N 查询处理

学位申请人:潘丽娟
指导教师:朱亮教授
学位类别:工学硕士
学科专业:计算机软件与理论
授予单位:河北大学
答辩日期:二○一三年五月
Classified Index: CODE:10075
: NO:20101179

A Dissertation for the Degree of M. Engineering

Processing of Chinese keywords Top-N
Queries over a Database with Multiple
Relations

Candidate: Pan Lijuan
Supervisor: Prof. Zhu Liang
Academic Degree Applied for: Master of Engineering
Specialty: Computer Software and Theory
University: Hebei University
Date of Oral Examination: May, 2013
摘要
摘要
关键词查询的理论和技术在信息检索和 Web 搜索引擎中得到了广泛深入的研究和应
用。传统数据库管理系统仅支持模式匹配,不支持自由形态的关键词查询。鉴于此,近
年来关系数据库上的关键词查询处理的研究成为备受关注的前沿课题之一。传统关系数
据库系统运用结构化查询语言(SQL)对数据库进行操作,需要用户掌握 SQL 和数据库模
式,这对于普通用户是困难的。此外,对返回的查询结果,传统数据库系统只能进行简
单排序,用户要想从中获取最感兴趣的信息是很困难的。目前,关键词查询的研究主要
针对英文关键词,因此针对具有多表的数据库,本文给出一种中文关键词 top-N 查询处
理方法。此方法创建索引表存储从数据库中析出的中文元组字及其相关信息,进而构造
索引用以快速匹配查询关键字,借鉴 IR 的相似度公式构造适合中文关键词查询的排序策
略。对于一个中文关键词查询,利用索引快速匹配查询字和元组字得到相应信息,并根
据这些信息创建候选元组生成链表和 SQL 查询语句, 进而得到候选元组及其与查询之间
的相似度,最终按相似度返回 Top-N 结果。此方法实现了按字搜索及中文的缩略词的查
询处理。最后利用真实数据集进行实验,实验内容包括对查询相应时间和准确性的验证,
实验数据显示本文方法是有效的。

关键词关系数据库中文关键词索引排序策略

I
Abstract
Abstract
The theories and techniques of keyword query have been extensively studied and applied
in Information Retrieval and Web search engines. Traditional relational database management
systems support pattern match of tuples with query conditions; however, they do not support
free-form keyword search. Thus, the processing of keyword queries over relational databases
has intensified in recent years, and has been one of active research issues. Traditional
relational database systems utilize SQL (Structured Query Language) to search the database,
and require users to know the database schema and SQL. These requirements are difficult for
ordinary users to use such search model. Additionally, the ranking functions