文档介绍:Text and Web Mining
Machine Learning and Data Mining (Unit 19)
Prof. Pier Luca Lanzi
References 2
Jiawei Han and Micheline Kamber,
"Data Mining: Concepts and
Techniques", The Morgan Kaufmann
Series in Data Management Systems
(Second Edition)
Chapter 10, part 2
Web Mining Course by
Gregory-Platesky Shapiro available at
Prof. Pier Luca Lanzi
Mining Text Data: An Introduction 3
Data Mining / Knowledge Discovery
Structured Data Multimedia Free Text Hypertext
HomeLoan ( Frank Rizzo bought <a href>Frank Rizzo
Loanee: Frank Rizzo his home from Lake </a> Bought
Lender: MWF View Real Estate in <a hef>this home</a>
Agency: Lake View 1992. from <a href>Lake
Amount: $200,000 He paid $200,000 View Real Estate</a>
Term: 15 years under a15-year loan In <b>1992</b>.
) Loans($200K,[map],...) from MW Financial. <p>...
Prof. Pier Luca Lanzi
Bag-of-Tokens Approaches 4
Documents Token Sets
Four score and seven nation – 5
years ago our fathers brought civil - 1
forth on this continent, a new war – 2
nation, conceived in Liberty, Feature men – 2
and dedicated to the Extraction died – 4
proposition that all men are people – 5
created equal. Liberty – 1
Now we are engaged in a God – 1
great civil war, testing …
whether that nation, or …
Loses all order-specific information!
Severely limits context!
Prof. Pier Luca Lanzi
Natural Language Processing 5
A dog is chasing a boy on the playground Lexical
Det Noun Aux Verb Det Noun Prep Det Noun analysis
(part-of-speech
Noun Phrase tagging)
Noun plex Verb Noun Phrase
Prep Phrase
Semantic analysis
Verb Phrase Syntactic analysis
Dog(d1). (Parsing)
Boy(b1).
Playground(p1). Verb Phrase
Chasing(d1,b1,p1).
+ Sentence
Scared(x) if Chasing(_,x,_).
A person saying this may
be reminding another person to
get the dog back…
Scared(b1)
Inference Pragmatic analysis
(speech act)
(Taken from ChengXiang Zhai, CS 397cxzProf. –Pier Fall Luca 2003)