1 / 62
文档名称:

基于固态硬盘的倒排索引动态更新策略及其优化分析.docx

格式:docx   大小:606KB   页数:62页
下载后只包含 1 个 DOCX 格式的文档,没有任何的图纸或源代码,查看文件列表

如果您已付费下载过本站文档,您可以点这里二次下载

分享

预览

基于固态硬盘的倒排索引动态更新策略及其优化分析.docx

上传人:wz_198613 2018/5/15 文件大小:606 KB

下载得到文件列表

基于固态硬盘的倒排索引动态更新策略及其优化分析.docx

相关文档

文档介绍

文档介绍:Abstract
Index maintenance strategy plays a crucial role in a full-text retrieval system which has to deal with dynamic text collections and satisfy users’ real-time query requirement. Existing index maintenance methods are mainly designed based on the features of hard disk drive (HDD), and the performance is limited by the relatively low I/O performance of HDD. The emerging solid state disk (SSD) has many desired merits, pared with HDD, the most prominent one is its high performance of random data access. If we prop- erly use SSD instead of HDD to store the inverted index, system’s overall performance will be greatly improved. However, SSD has some characteristics that are totally different with HDD, and they are not considered in the existing index maintenance methods. Therefore, directly adopting existing methods to maintain index on SSD will not only fail to make full use of SSD’s advantages, but also do harm to SSD.
First, the existing index maintenance methods are analyzed on SSD through experi- ments and they are found no longer suitable for SSD: The pure in-place method produces overmuch random writes, while the merge-based method generates massive size of writes seeming unnecessary for SSD, which impose heavy traffic and harm to SSD. Based on the experiment result, we propose some principles for designing SSD-based index mainten- ance strategy.
Second, a new hybrid index update strategy is proposed. The strategy classifies all terms into short and long according to the length of their posting lists, and their indexes are separately maintained by no merge and in-place, based on SSD’s fast random read and relatively efficient semi-random write characteristics. Through this way, inefficient small random writes are avoided and extra write operations caused by merge are also prevented. Compared to existing methods, experimental results demonstrate our design improves prehensive index maintenance and query performance; meanwhile, it is friendlier to SSD, espec