1 / 97
文档名称:

Oreilly.Natural.Language.Annotation.for.Machine.Learning.Mar.2012.pdf

格式:pdf   页数:97
下载后只包含 1 个 PDF 格式的文档,没有任何的图纸或源代码,查看文件列表

如果您已付费下载过本站文档,您可以点这里二次下载

Oreilly.Natural.Language.Annotation.for.Machine.Learning.Mar.2012.pdf

上传人:kuo08091 2014/3/14 文件大小:0 KB

下载得到文件列表

Oreilly.Natural.Language.Annotation.for.Machine.Learning.Mar.2012.pdf

文档介绍

文档介绍:Natural Language Annotation for
Machine Learning
wnload from Wow! eBook <>
o
D James Pustejovsky and Amber Stubbs
Beijing • Cambridge • Farnham • Köln • Sebastopol • Tokyo
Natural Language Annotation for Machine Learning
by James Pustejovsky and Amber Stubbs
Revision History for the :
2012-03-06 Early release revision 1
2012-03-26 Early release revision 2
See /catalog/?isbn=9781449306663 for release details.
ISBN: 978-1-449-30666-3
1332788036
Table of Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
1. The Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
The Importance of Language Annotation 1
The Layers of Linguistic Description 2
What is Natural Language Processing? 4
A Brief History of Corpus Linguistics 5
What is a Corpus? 7
Early Use of Corpora 9
Corpora Today 12
Kinds of Annotation 13
Language Data and Machine Learning 18
Classification 19
Clustering 19
Structured Pattern Induction 19
The Annotation Development Cycle 20
Model the phenomenon 21
Annotate with the Specification 24
Train and Test the algorithms over the corpus 25
Evaluate the results 26
Revise the Model and Algorithms 27
Summary 28
2. Defining Your Goal and Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Defining a goal 31
The Statement of Purpose 32
Refining your Goal: Informativity versus Correctness 33
Background research 38
Language Resources 39
Organizations and Conferences 39
NLP Challenges 40
iii
Assembling your dataset 40
Collecting data from the 41
Eliciting data from people 41
Preparing your data for annotation 42
Metadata 42
Pre-processed data 43
The size of your corpus 44
Existing Corpora 44
Distributions within corpora 45
Summary 47
3. Buildin