文档介绍:摘 要
云计算是当前研究的热门课题,云存储作为云计算的衍生,也成为当前国内外最为
热门的研究领域。其中,Hadoop 文件系统 HDFS 作为 Google File System 的开源实现,
成为业界研究云计算和云存储、实现云应用和云服务参考的标准模型。然而,现有 HDFS
架构却有着一些不足,典型的包括对小文件支持的不足,以及单一 NameNode 容易成为
整个集群性能瓶颈等问题。
本文在研究现有 HDFS 的基础上,给出了相应的解决方案,对于小文件问题,本文
提出了一种引入用户元数据空间的方式来将 HDFS 中的小文件存储合并为大文件存储;
对于 HDFS 单一 NameNode 性能瓶颈问题,本文提出了一种基于 MongoDB 的多
NameNode 解决方案。实验结果表明,本文提出的方案,不仅拓展了 HDFS 集群的命名
空间,而且提高了 HDFS 的并发读写速度。
除了对 HDFS 现有架构进行了相关优化,本文还在现有 HDFS 架构的基础上,架设
了一个云存储系统,实现了文件的上传、下载、共享、浏览等功能。同时,该系统还可
以对当前 HDFS 集群进行监控,监控信息包括集群容量信息、集群块信息,单个节点的
负载信息、CPU 使用信息等。云存储系统的实现,对基于 HDFS 的相关应用具有探索和
指导意义。
关键词:云存储;HDFS;分布式文件系统;小文件
I
Abstract
Cloud computing has been a research focus currently. As a derivative of cloud computing,
Cloud storage is also one of the most popular study fields in China. As the open source
implementation of the Google File System(GFS), Hadoop file system HDFS actually become
the standard of the study on cloud computing and cloud storage and the implementation of
cloud applications and cloud services. However, HDFS architecture has a number of
shortcoming, typically including lack of support for small files, performance bottlenecks
because of the single NameNode. How to solve these problems, is one of the hotspots of the
current study.
Based on the study of HDFS, the paper proposes some appropriate solutions. For small
file problem, this paper proposes a way to introduce user metadata space to merge the small
files for large files in the HDFS storage; for the HDFS NameNode single performance
bottleneck, this article proposes a NameNode solutions based on MongoDB. Experimental
results show that this proposal, not only expanding the HDFS namespace clusters, and
concurrent read and write spe