文档介绍:本科生毕业论文
题目:(中文) PARADISE前端平台测试与优化
(英文) Evaluation and Optimization for the Front Service of PARADISE
姓名:
学号:
院系: 信息科学技术学院
专业: 计算机科学与技术
指导教师:
二〇一七二〇一七年十一月十一日
摘要
PARADISE是一种智能中文搜索引擎平台,分成前段和后台两部分。前端和后台在运行时进行交互,完成整个平台功能。本文从PARADISE前端的功能需求出发,详细介绍前端的各个功能模块的设计和实现方式,并且说明了PARADISE前端测试和优化方法。PARADISE前端承担了提取搜索引擎摘要的功能,但目前对于搜索引擎摘要的选取原则、算法规则却没有明确的框架或算法。因此本文对搜索引擎摘要的意义、标准、分类等进行详细的讨论,并以形成用户查询为中心的动态摘要为出发点,给出算法实现的形式化原则,并具体实现了一种动态摘要算法。与百度搜索引擎的动态摘要相比较,实验显示一致性上比百度高6%。此处的一致性是指,摘要和用户查询相关性与原文档和用户查询相关性之间的关系,二者越接近一致性越高。该算法已经应用到北京大学校内搜索引擎上。
关键词:查询,关键词,摘要,查询日志,点击日志
Abstract
PARADISE is a Platform for Applying Researching And Developing intelligent Search Engine, composed of ponents — front service and backstage supporter. And interaction of the two sections achieves the entire function of PARADISE. This paper introduces n detail about the implemention of all the front service’s function and illustrate the method for the evaluation and optimization of the PARADISE front service. PARADISE front service needs to get search engine snippet for each search results. However, selection principles or algorithm rules of search engine snippets are not clearly stipulated. This paper is aiming at providing a formal principle of algorithm implementation grounded on a query biased dynamic snippet. Dynamic snippet algorithm is achieved under this framework. Compared with Baidu, experiments show that the coherence of our algorithm is 6% higher than that of Baidu. The coherence denotes the relationship between user need and snippet, and user need and documents. Additionally, our algorithm has been applied to the PKU’s campus search.
Keywords: query,key word,snippet,query log,clickthrough
目录
第一章引言 1
第二章 PARADISE前端设计与实现 3
PARADISE前端功能需求与功能实现 3
PARADISE前端功能模块划分,及其关系和工作流程 7
PARADISE前端功能模块 7
PARADISE前端功能模块工作流程 10
PARADISE前端性能优化 11
PARADISE前端时间性能优化 12
PARADISE前端显示优化—站点聚类 12
第三章搜索引擎摘要综述 13
背景 13
自动文摘 13
点击日志 13
查询分类 14
网页正文