1 / 206
文档名称:

Machine Learning for Information Extraction in Informal Domains.pdf

格式:pdf   页数:206
下载后只包含 1 个 PDF 格式的文档,没有任何的图纸或源代码,查看文件列表

如果您已付费下载过本站文档,您可以点这里二次下载

Machine Learning for Information Extraction in Informal Domains.pdf

上传人:kuo08091 2014/9/17 文件大小:0 KB

下载得到文件列表

Machine Learning for Information Extraction in Informal Domains.pdf

文档介绍

文档介绍:Machine Learning for Information Extraction in
Informal Domains
Dayne Freitag
November, 1998
CMU-CS-99-104
Computer Science Department
Carnegie Mellon University
Pittsburgh, PA
Submitted in partial fulfillment of the requirements
for the degree of Doctor of Philosophy.
mittee:
Tom Mitchell, Chair
Jaime Carbonell
David Evans
Oren Etzioni, University of Washington
  c 1998 Dayne Freitag
This research was sponsored by Wright Laboratory, Aeronautical Systems Center under grant number
F33615-93-1-1330 and Rome Laboratory under grant number F30602-97-1-0215, both of the Air Force Ma-
mand-USAF, and by the Defense Advanced Research Projects Agency (DARPA). Part of this
research was conducted during a summer internship at Justsystem Pittsburgh Research Center.
The views and conclusions contained in this document are those of the author and should not be inter-
preted as representing the official policies, either expressed or implied, of any sponsoring party or the US
Government.
Keywords: machine learning, information extraction, information retrieval, multistrat-
egy learning
Abstract
Information extraction, the problem of generating structured summaries of human-oriented
text documents, has been studied for over a decade now, but the primary emphasis has been
on document collections characterized by well-formed prose (., newswire articles). So-
lutions have often involved the hand-tuning of general natural language processing systems
to a particular domain. However, such solutions may be difficult to apply to “informal” do-
mains, domains based on genres characterized by syntactically unparsable text and frequent
out-of-lexicon terms. With the growth of the , such genres, which include email
messages, newsgroup posts, and Web pages, are particularly abundant, and there is no lack
of potential information extraction applications. Examples include a program to extract
names from personal home pages, or a system that monit