1 / 66
文档名称:

基于固态硬盘的倒排索引构建与维护策略分析.docx

格式:docx   大小:1,248KB   页数:66页
下载后只包含 1 个 DOCX 格式的文档,没有任何的图纸或源代码,查看文件列表

如果您已付费下载过本站文档,您可以点这里二次下载

分享

预览

基于固态硬盘的倒排索引构建与维护策略分析.docx

上传人:wz_198613 2018/6/1 文件大小:1.22 MB

下载得到文件列表

基于固态硬盘的倒排索引构建与维护策略分析.docx

相关文档

文档介绍

文档介绍:ABSTRACT
Inverted index is the core data structure for Information Retrieval (IR) systems, and with the rapid growth of digitalization level, a massive size of inverted indexes have to be maintained, which are usually stored on hard disks. Although hard disks have the merits of high capacity and low price, comparing to the speed of CPU and memory, the performance of hard disks is much slower and considering to their mechanic nature this trend is unlikely to change in the nearest future, which further causes the I/O bottleneck of IR systems. On the other hand, the flash based Solid State Drivers (SSD) e a hot research object in the field of data storage. Comparing to the conventional hard disks the most outstanding advantage of SSDs is their much higher I/O performance then hard disks. Therefore, if inverted indexes can be stored on SSDs instead of hard disks, the overall performance of IR systems will definitely improve. However, existing index management strategies are all designed toward the hard disks, and considering the unique characteristics of SSDs, these approaches not only cannot make full use of the SSD but also can be harmful to it.
First, the present thesis analyzes existing index construction and maintenance approaches on SSD. As these conventional strategies are all based on the hard disks, from the experimental results we observe that the in-place method is low efficiency and produces large number of random writes. Meanwhile, the merge-based method results in heavy write traffic on SSD which could further reduce its lifetime. Therefore, considering these analysis, both the proposed index construction and maintenance strategies still follow the basic idea of merge-based methods, however the extra write traffic which triggered by the merge event should be eliminated.
Second, a new index construction and a new index maintenance approaches are given respectively. The proposed index construction method store the temporary index files which are produced