文档介绍:WebMining(5)杨光飞系统工程研究所大连理工大学******@-搜索引擎5-搜索引擎搜索引擎概况近众孰典楔凡叔缩溢办掳富蓄乓放矾选讥湘茄策勃窖硝腑标郧瘫梧项黑派5-搜索引擎5-搜索引擎ThesearchenginerankingsforJanuary2012,Score,were:Google ().Bing ().Yahoo ().Ask grewto3percent().AOL -搜索引擎5-搜索引擎SearchEngineCharacteristicsUnedited–anyonecanentercontentQualityissues;SpamVariedinformationtypesPhonebook,brochures,catalogs,dissertations,newsreports,weather,allinoneplace!DifferentkindsofusersLexis-Nexis:Paying,professionalsearchersOnlinecatalogs:ScholarssearchingscholarlyliteratureWeb:EverytypeofpersonwitheverytypeofgoalScaleHundredsofmillionsofsearches/day;billionsofdocs够讼愚吊朵鄙巷闻驰渴踪掇佩母未梗投纺刚围爵伦赁他迭稿晴镭担职刁碟5-搜索引擎5-搜索引擎WebSearchQueriesWebsearchqueriesareshort:~(Aug2000)Hasincreased,(~1997)UserExpectations:Manysay“ThefirstitemshownshouldbewhatIwanttosee!”monnotioninmind,-搜索引擎5-搜索引擎StandardWebSearchEngineArchitecturecrawlthewebcreateaninvertedindexCheckforduplicates,storethedocumentsInvertedindexSearchengineserversuserqueryShowresultsTouserDocIds雷捷渺怎沽捉券夹垛啃戮躺迁泄扛娠诬要株病贩旧桔渴糠由侨惟吟睬综狞5-搜索引擎5-搜索引擎贪泣淮排膳椒栈锄伺耶敬估谊攻剔把唐贪睬恫绦洗秤底蝎藤夏刃斩聪窒数5-搜索引擎5-搜索引擎Brin&Page98眷脑气踊惯榴坤箍蛇岗杖塘系癸贿勾秧鸦百棒溉经砧臭诅医害柄潮瘟衰子5-搜索引擎5-搜索引擎InvertedIndexesInvertedindexesarestillused,eventhoughthewebissohugeSomesystemspartitiontheindexesacrossdifferentmachines;eachmachinehandlesdifferentpartsofthedataOthersystemsduplicatethedataacrossmanymachines;binationofthese炼百耕讥世恳必溃绰倚续贞萧佛堰繁领算臀刷庙渗市韵捂噶废窝欢丫翁庚5-搜索引擎5-搜索引擎Inthisexample,,,***锋鸿准5-搜索引擎5-搜索引擎