1 / 95
文档名称:

Dependency Parsing by Belief Propagation - JHU Department of 依存句法分析的信念传播-约翰霍普金斯大学部.ppt

格式:ppt   大小:2,173KB   页数:95页
下载后只包含 1 个 PPT 格式的文档,没有任何的图纸或源代码,查看文件列表

如果您已付费下载过本站文档,您可以点这里二次下载

分享

预览

Dependency Parsing by Belief Propagation - JHU Department of 依存句法分析的信念传播-约翰霍普金斯大学部.ppt

上传人:bodkd 2022/1/6 文件大小:2.12 MB

下载得到文件列表

Dependency Parsing by Belief Propagation - JHU Department of 依存句法分析的信念传播-约翰霍普金斯大学部.ppt

相关文档

文档介绍

文档介绍:*
David A. Smith (JHU  UMass Amherst)
Jason Eisner (Johns Hopkins University)
Dependency Parsing by Belief Propagation
Outline
Edge-factored parsing
Dependency parses
Scoring the competing parses: Edge features
Finding the best parse
Higher-order parsing
Throwing in more features: Graphical models
Finding the best parse: Belief propagation
Experiments
Conclusions
New!
Old
MOD
Word Dependency Parsing
He reckons the current account deficit will narrow to only billion in September.
Raw sentence
Part-of-speech tagging
He reckons the current account deficit will narrow to only billion in September.
PRP VBZ DT JJ NN NN MD VB TO RB CD CD IN NNP .
POS-tagged sentence
Word dependency parsing
slide adapted from Yuji Matsumoto
Word dependency parsed sentence
He reckons the current account deficit will narrow to only billion in September .
SUBJ
ROOT
S-COMP
SUBJ
SPEC
MOD
MOD
COMP
COMP
What does parsing have to do with belief propagation?
loopy belief propagation
belief
loopy
propagation
*
Great ideas in NLP: Log-linear models (Berger, della Pietra, della Pietra 1996; Darroch & Ratcliff 1972)
In the beginning, we used generative models.
p(A) * p(B | A) * p(C | A,B) * p(D | A,B,C) * …
each choice depends on a limited part of the history
but which dependencies to allow?
what if they’re all worthwhile?
p(D | A,B,C)?
p(D | A,B,C)?
… p(D | A,B) * p(C | A,B,D)?
*
Great ideas in NLP: Log-linear models (Berger, della Pietra, della Pietra 1996; Darroch & Ratcliff 1972)
In the beginning, we used generative models.

Solution: Log-linear (max-entropy) modeling
Features may interact in arbitrary ways
Iterative scaling keeps adjusting the feature weights until the model agrees with the training data.
p(A) * p(B | A) * p(C | A,B) * p(D | A,B,C) * …
which dependencies to allow? (given limited training data)
(1/Z)