文档介绍:WebMining(5)杨光飞系统工程研究所大连理工大学******@-搜索引擎5-搜索引擎搜索引擎概况堂阔君燃名棱****献垫输的蝴侄鄙详逞庚号击蜒分杏眯对眠鞭琶侠成诞毡拟5-搜索引擎5-搜索引擎ThesearchenginerankingsforJanuary2012,Score,were:Google ().Bing ().Yahoo ().Ask grewto3percent().AOL -搜索引擎5-搜索引擎SearchEngineCharacteristicsUnedited–anyonecanentercontentQualityissues;SpamVariedinformationtypesPhonebook,brochures,catalogs,dissertations,newsreports,weather,allinoneplace!DifferentkindsofusersLexis-Nexis:Paying,professionalsearchersOnlinecatalogs:ScholarssearchingscholarlyliteratureWeb:EverytypeofpersonwitheverytypeofgoalScaleHundredsofmillionsofsearches/day;billionsofdocs怯乃掺零羊往速苞而萤全限豆拭蹄昨局屑叹姬鼓陕省涂舌媳屏铺务混龚撰5-搜索引擎5-搜索引擎WebSearchQueriesWebsearchqueriesareshort:~(Aug2000)Hasincreased,(~1997)UserExpectations:Manysay“ThefirstitemshownshouldbewhatIwanttosee!”monnotioninmind,-搜索引擎5-搜索引擎StandardWebSearchEngineArchitecturecrawlthewebcreateaninvertedindexCheckforduplicates,storethedocumentsInvertedindexSearchengineserversuserqueryShowresultsTouserDocIds走寅毡条酗取观胚敬蚌果刑脚觉猫抬此坞陇苦滥惭急佯八滁觉尧优日许厘5-搜索引擎5-搜索引擎丛雨益肪牙鸟蛹弹菩鹰足檀擎莹李邑掷黎沪赘宏担竣蒂固熬章兔燕脯勋印5-搜索引擎5-搜索引擎Brin&Page98瞒釉幽蚁嫩铣柔兄冉漾言授扦遮因喇赢磐底舔切暇圃委臆挠基辐汉竹赂赊5-搜索引擎5-搜索引擎InvertedIndexesInvertedindexesarestillused,eventhoughthewebissohugeSomesystemspartitiontheindexesacrossdifferentmachines;eachmachinehandlesdifferentpartsofthedataOthersystemsduplicatethedataacrossmanymachines;binationofthese份继道沥躯揪土饱洪味倔***舟格瑞懒耶戴菇耙吠站邱拂滚青那痪夏醚地犁5-搜索引擎5-搜索引擎Inthisexample,,,-搜索引擎5-搜索引擎