1 / 341


格式:pdf   页数:341页
下载后只包含 1 个 PDF 格式的文档,没有任何的图纸或源代码,查看文件列表



上传人:精品库 2016/4/21 文件大小:0 KB





文档介绍:Mining of Massive Datasets Anand Rajaraman Kosmix, Inc. Je?rey D. Ullman Stanford Univ. Copyright c2010, 2011 Anand Rajaraman and Je?rey D. Ullman ii Preface This book evolved from material developed over several years by Anand Raja- raman and Je? Ullman for a one-quarter course at Stanford. The course CS345A, titled “Web Mining,” was designed as an advanced graduate course, although it has e accessible and interesting to advanced undergraduates. What the Book Is About At the highest level of description, this book is about data mining. However, it focuses on data mining of very large amounts of data, that is, data so large it does not ?t in main memory. Because of the emphasis on size,many of our examples are about the Web or data derived from the Web. Further, the book takes an algorithmic point of view: data mining is about applying algorithms to data, rather than using data to “train” a machine-learning engine of some sort. The principal topics covered are: 1. Distributed ?le systems and map-reduce as a tool for creating parallel algorithms that eed on very large amounts of data. 2. Similarity search, including the key techniques of minhashing and locality- sensitive hashing. 3. Data-stream processing and specialized algorithms for dealing with data that arrives so fast it must be processed immediately or lost. 4. The technology of search engines, including Google’s PageRank, link-spam detection, and the hubs-and-authorities approach. 5. Frequent-itemset mining, including association rules,market-baskets, the A-Priori Algorithm and its improvements. 6. Algorithms for clustering very large, high-dimensionaldatasets. 7. Two key problems for Web applications: managing advertising and rec- ommendation systems. iii iv PREFACE Prerequisites CS345A, although its number indicates an advanced graduatecourse, has been found accessible by advanced undergraduates and beginningmasters students. In the future, it is likely that the course will be given a mezzanine-level numb


技术人员个人工作辞职报告(3篇) 6页

护士述职报告简洁版(内容格式15篇) 43页

教学常规心得体会合集15篇 28页

教师家庭困难补助申请书范文(3篇) 6页

教师节模范教师发言稿范文(6篇) 11页

2025年度室内消防改造项目施工及验收合同模板.. 8页

2025年度宠物医院兽医专业人才招聘与聘用协议.. 7页

2025年度实习生实习单位实习期间实习成果转化.. 8页

2025年度定制木门与户外园林景观设计合同 9页

2025年度安置房多余面积使用权转让合作协议 9页

2025年度宅基地界限争议仲裁与使用权变更协议.. 8页

2025年度学生食堂食品安全管理与责任书 8页

2025年度学校炊事员聘用合同书——校园食堂炊.. 8页

2025年度学校体育设施租赁合同 9页

2025年度子女提供老人生活照料及家庭安全监控.. 9页

2025年度婚庆婚礼现场VR体验服务合同模板 9页

汽车展厅装修工程样本3篇 54页

2025年度婚内协议书:婚内子女教育及监护权协.. 7页

2025年度夫妻离婚财产分割协议书及财产增值处.. 7页

2025年度农村墓地墓区安全管理与维护服务合同.. 9页

水泥厂污泥清理运输协议3篇 48页

水产原料采购运输合同模板3篇 48页

水上客运航线运营合同3篇 52页

民航机场土方运输合同模板3篇 50页

武汉私人会所装修合同范本3篇 53页

橡胶厂半包装修协议模板3篇 49页

植物园改造监理3篇 57页

棋牌室经典装修合同模板3篇 53页

桌球室翻新人工合同3篇 50页

校园停车场装修合同3篇 54页