1 / 43
文档名称:

海量数据处理案例.ppt

格式:ppt   页数:43
下载后只包含 1 个 PPT 格式的文档,没有任何的图纸或源代码,查看文件列表

如果您已付费下载过本站文档,您可以点这里二次下载

分享

预览

海量数据处理案例.ppt

上传人:智客网 2011/11/21 文件大小:0 KB

下载得到文件列表

海量数据处理案例.ppt

文档介绍

文档介绍:A Full Text Search Engine For BBS Lily
主讲人:顾荣
指导老师:黄宜华
Email:gurongwalker@
Contents
Background
Brief Intro to principle of Full Text Search Engine
Implement of FTSE for BBS Lily
Maybe Google&Baidu has done these...
Conclusion

What is a full text search engine?


Why do we need it?
What is a full text search engine?
In a full text search, the search engine examines all of the words in every stored document as it tries to match search words supplied by the user. 
------From Wiki
Why do we need a FTSE for BBS Lily?
Total amount :around 3million posts
Over a thousand everyday.
Each post’s size :1K~4K
Data In
BBS Lily
Base
Capacity
Increasing
Speed
Post
Granularity
Intro to the Principle of Full Text Search Engine
What happens after you press enter?
Abstract IR Architecture
Documents
Query
Hits
Representation
Function
Representation
Function
Query Representation
Document Representation
Comparison
Function
Index
offline
online
document acquisition (., web crawling)
About Representation Function
Documents
Inverted
Index
Bag of Words
case folding, tokenization, stopword removal, stemming
syntax, semantics, word knowledge, etc.
A Simple Inverted Index Demo
1
1
1
2
1
1
1
1
1
1
1
2
1
1
1
1
1
1
1
2
3
1
1
1
4
1
1
1
2
1
1
2
1
blue
cat
egg
fish
green
ham
hat
one
1
1
1
1
1
1
2
1
blue
cat
egg
fish
green
ham
hat
one
1
1
red
1
1
two
1
red
1
two
one fish, two fish
Doc 1
red fish, blue hat
Doc 2
cat in the hat
Doc 3
green eggs and ham
Doc 4
3
4
1
4
4
3
2
1
2
2
1
1
2