文档介绍：华中科技大学
硕士学位论文
虚拟桌面环境下数据去冗余系统的设计与实现
姓名:侯海翔
申请学位级别:硕士
专业:软件工程
指导教师:陈传波
2011-05-11
摘要
数据去冗余系统旨在找出存储数据中的冗余部分,采用指针或者其他符号来表示
重复数据块的方法,对这些重复数据进行消除,得以减少存储空间的占用,节约存储
成本。
通过对常见的数据去冗余系统进行分析、研究表明:绝大多数数据去冗余系统都
是为了应对数据的爆炸性增长而产生,并且主要运用在大规模数据备份和内容寻址存
储场景当中。当数据的数量级在 TB 以上时,数据去冗余系统将能够大幅度减少数据
备份的成本。它也存在一些固有的缺陷,例如:与能够进行灾难恢复的副本系统无法
和睦并存;在线模式使用下,大量的计算过程会导致磁盘读写性能大幅度下降;利用
特有硬件加速可以减少其对磁盘读写性能的影响,但是价格过于昂贵。结合确定的存
储平台,对数据去冗余系统的架构进行重新设计,引入合适的重复数据删除技术,能
够有效的在系统效能和硬件成本之间找到平衡点
透过确定的存储平台,可以充分挖掘平台中数据的结构特点以及冗余状况,根据
冗余数据的分布来优化数据去冗余系统的体系结构。通过对各个模块的功能及交互的
合理设计,不仅可以提高数据去冗余系统的工作效率,也能充分利用各种硬件资源,
提高整体平台的性价比与能耗比。
通过引入合适的重复数据删除技术,能够在保证冗余数据消除的同时,尽可能的
减少中间计算过程,并减少系统资源占用。使存储平台能够提供更多的资源给应用方,
提高存储服务质量,保证用户体验。
在系统的构建过程当中,采用了两层去冗余系统架构。通过充分使用整套平台下
的剩余资源以及采用适当的重复数据删除技术进行优化,使系统能够在实时去除冗余
数据的条件下正常工作,并且满足存储平台的基本性能要求。

关键词:虚拟化, 重复数据删除, 内容寻址存储, 分布式存储
I
Abstract
The goal of data deduplication system is to eliminate those redundancy data, which
make more data can be stored in existing capacity and reduce total storage cost by replacing
the redundancy data with a pointer or a sign.
By the study on the current situation of data deduplication systems, the research shows
that most of data deduplication systems emerged because of the explosion of data, and to be
used for extensive data backup or content addressable storage. When the magnitude of data
get to be more than TB, a data deduplication system could reduce total storage cost
efficiently. It also has such defects as unable to coexist with replica mechanism friendly
which could provide disaster recovery service, including a lot of calculation, the I/O
performance would be badly affected when it works in on-line mode, and to minimize the
impact with hardware acceleration, the cost is much too high. Considering the special
platform, it is best to redesign the architecture of data deduplication system. With the help
of data deduplication technology, the system can get a blance between performance and
cos