1 / 75
文档名称:

machine-learning-and-data-mining-19-mining-text-and-web-data-26716.pdf

格式:pdf   页数:75
下载后只包含 1 个 PDF 格式的文档,没有任何的图纸或源代码,查看文件列表

如果您已付费下载过本站文档,您可以点这里二次下载

machine-learning-and-data-mining-19-mining-text-and-web-data-26716.pdf

上传人:bolee65 2014/9/25 文件大小:0 KB

下载得到文件列表

machine-learning-and-data-mining-19-mining-text-and-web-data-26716.pdf

文档介绍

文档介绍:Text and Web Mining
Machine Learning and Data Mining (Unit 19)
Prof. Pier Luca Lanzi
References 2
‰ Jiawei Han and Micheline Kamber,
"Data Mining: Concepts and
Techniques", The Morgan Kaufmann
Series in Data Management Systems
(Second Edition)
‰ Chapter 10, part 2
‰ Web Mining Course by
Gregory-Platesky Shapiro available at

Prof. Pier Luca Lanzi
Mining Text Data: An Introduction 3
Data Mining / Knowledge Discovery
Structured Data Multimedia Free Text Hypertext
HomeLoan ( Frank Rizzo bought <a href>Frank Rizzo
Loanee: Frank Rizzo his home from Lake </a> Bought
Lender: MWF View Real Estate in <a hef>this home</a>
Agency: Lake View 1992. from <a href>Lake
Amount: $200,000 He paid $200,000 View Real Estate</a>
Term: 15 years under a15-year loan In <b>1992</b>.
) Loans($200K,[map],...) from MW Financial. <p>...
Prof. Pier Luca Lanzi
Bag-of-Tokens Approaches 4
Documents Token Sets
Four score and seven nation – 5
years ago our fathers brought civil - 1
forth on this continent, a new war – 2
nation, conceived in Liberty, Feature men – 2
and dedicated to the Extraction died – 4
proposition that all men are people – 5
created equal. Liberty – 1
Now we are engaged in a God – 1
great civil war, testing …
whether that nation, or …
Loses all order-specific information!
Severely limits context!
Prof. Pier Luca Lanzi
Natural Language Processing 5
A dog is chasing a boy on the playground Lexical
Det Noun Aux Verb Det Noun Prep Det Noun analysis
(part-of-speech
Noun Phrase tagging)
Noun plex Verb Noun Phrase
Prep Phrase
Semantic analysis
Verb Phrase Syntactic analysis
Dog(d1). (Parsing)
Boy(b1).
Playground(p1). Verb Phrase
Chasing(d1,b1,p1).
+ Sentence
Scared(x) if Chasing(_,x,_).
A person saying this may
be reminding another person to
get the dog back…
Scared(b1)
Inference Pragmatic analysis
(speech act)
(Taken from ChengXiang Zhai, CS 397cxzProf. –Pier Fall Luca 2003)