文档介绍:河北大学
硕士学位论文
基于用户兴趣及术语关系的查询扩展方法
姓名:崔琰
申请学位级别:硕士
专业:计算机应用技术
指导教师:徐建民
2011-05
摘要
摘要
传统信息检索方法没有考虑用户的兴趣因素,当不同用户在输入相同查询时得到的
检索结果是相同的,不能根据用户的真实检索意图给出个性化的结果。本文提出一种基
于用户兴趣和术语关系的查询扩展方法,利用该方法可以得到更符合用户兴趣的查询扩
展词,提高检索的个性化水平。
本文所做工作主要包括以下几个方面:
:对用户浏览的网页文档内容和对应网页日志中的信息进
行挖掘,将网页文档通过分词和统计词频等操作得到代表网页的术语集合;对网页日志
中记录的用户的浏览页面时间和点击等操作记录进行挖掘并计算网页中术语的权重,得
到代表用户兴趣的术语集合。
:根据用户兴趣术语中包含的初始查询词的同义词对初始
查询词权重进行相应的调整;分析领域本体中包含的用户兴趣术语与初始查询词之间的
术语关系,将用户兴趣术语中与初始查询词之间存在本体关联关系的术语作为初始查询
的扩展词,组合成新的查询。
:设计了对比实验,将本文所提出的查询扩展方法和未进行查询扩展
的检索方法进行比较。
实验结果表明,使用本文方法可以检索出更多的相关文档,检索结果更加符合用户
的个性化需求。
关键词用户兴趣查询扩展同义词本体
I
Abstract
Abstract
The users’ interest isn’t taken into consideration in the traditional method of information
retrieval. The different users will receive the same result when they put into the same queries,
while they can’ get the personalized result based on their real intension of search. This thesis
presents a method of query expansion based on users’ interest and the relationship of terms. It
can expand terms which can express more users’ intension based on their initial query to
improve the level of personalization.
The major tasks of the thesis are as follows:
1. Explore the terms users are interested in: Exploring the web documents users look
through and the corresponding information, we get the collections of terms presenting the
web by the operation such as analyzing the terms and frequency count. Through the time
when users look through the web and the operation record of click, we explore and calculate
the value of terms in the web to get the collections of the terms presenting users’ interest.
2. Expand the initial query: We adjust the original inquiries' value according to the
synonyms of the initial query in the terms presenting users’ interest; bined these terms
related to ontology between the terms presenting users’ interest and the initial queries into
new queries.
3. Experi