1 / 341
文档名称:

大数据.pdf

格式:pdf   页数:341页
下载后只包含 1 个 PDF 格式的文档,没有任何的图纸或源代码,查看文件列表

如果您已付费下载过本站文档,您可以点这里二次下载

大数据.pdf

上传人:精品库 2016/4/21 文件大小:0 KB

下载得到文件列表

大数据.pdf

相关文档

文档介绍

文档介绍:Mining of Massive Datasets Anand Rajaraman Kosmix, Inc. Je?rey D. Ullman Stanford Univ. Copyright c2010, 2011 Anand Rajaraman and Je?rey D. Ullman ii Preface This book evolved from material developed over several years by Anand Raja- raman and Je? Ullman for a one-quarter course at Stanford. The course CS345A, titled “Web Mining,” was designed as an advanced graduate course, although it has e accessible and interesting to advanced undergraduates. What the Book Is About At the highest level of description, this book is about data mining. However, it focuses on data mining of very large amounts of data, that is, data so large it does not ?t in main memory. Because of the emphasis on size,many of our examples are about the Web or data derived from the Web. Further, the book takes an algorithmic point of view: data mining is about applying algorithms to data, rather than using data to “train” a machine-learning engine of some sort. The principal topics covered are: 1. Distributed ?le systems and map-reduce as a tool for creating parallel algorithms that eed on very large amounts of data. 2. Similarity search, including the key techniques of minhashing and locality- sensitive hashing. 3. Data-stream processing and specialized algorithms for dealing with data that arrives so fast it must be processed immediately or lost. 4. The technology of search engines, including Google’s PageRank, link-spam detection, and the hubs-and-authorities approach. 5. Frequent-itemset mining, including association rules,market-baskets, the A-Priori Algorithm and its improvements. 6. Algorithms for clustering very large, high-dimensionaldatasets. 7. Two key problems for Web applications: managing advertising and rec- ommendation systems. iii iv PREFACE Prerequisites CS345A, although its number indicates an advanced graduatecourse, has been found accessible by advanced undergraduates and beginningmasters students. In the future, it is likely that the course will be given a mezzanine-level numb

最近更新

ZC22-HB13型薄煤层综采设备配套的研究 2页

YSB-160水冷却器组的改造 2页

YBZ液压补偿式震击器的研制与应用 2页

X射线衍射物相定性分析的进展 2页

X射线元素面分布数字图像的功能和应用 2页

XJ型液体高威力炸药小直径药卷在井下矿应用鉴.. 2页

WY电厂接地网防腐蚀性分析及处理方法 2页

Weibull分布参数估计的灰色方法 2页

第二节《探索更小的微粒》教案(苏科版初二下).. 4页

VAXVMS Ada编译优化功能、结构和方法 2页

USP16介导钙调神经磷酸酶A去泛素化参与肠道炎.. 2页

2025年幼儿园安全教育工作责任书 13页

2025年幼儿园安全年度总结简短7篇 28页

TSP分布在评价盆地油田规模及分布中的应用 2页

TRIZ创新理论在防止超纯水机取水外溢中的应用.. 2页

财务会计信息和高层管理人员激励 1页

T42200大型座标镗床的设计与结构分析 2页

SZorb催化汽油吸附脱硫技术探讨 2页

2025年幼儿园夏季育儿知识大全 4页

SPICE在电子技术教学中的应用 2页

2025年幼儿园园长工作总结心得 40页

2025年幼儿园周工作总结范本5篇 12页

SIP-1300树脂在头孢菌素C锌盐提炼中的应用 2页

2025年幼儿园卫生管理保障制度7篇 14页

2025年幼儿园出游安全协议书范文 6页

2025年大学中国新能源电动汽车消费者调研报告.. 24页

艺术舞蹈老师简历模板 1页

煤炭资源地质勘查设计编写提纲 14页

硫酸铵生产硫酸钾的可行性方案 31页

2022年首都经济贸易大学工商管理专业《管理学.. 22页