文档介绍:软件学报 ISSN 1000-9825, CODEN RUXUEW E-mail: ******@iscas.
Journal of Software,2011,22(2):233−244 [doi: .]
©中国科学院软件研究所版权所有. Tel/Fax: +86-10-62562563
∗
口语对话中的代词指代消解
费仲超 1,2+, 周雅倩 1, 黄萱菁 1, 吴立德 1
1(复旦大学计算机科学技术学院,上海 200433)
2(上海贝尔股份有限公司产品线战略及技术领先部,上海 201206)
Pronoun Resolution in Spoken Dialog
FEI Zhong-Chao1,2+, ZHOU Ya-Qian1, HUANG Xuan-Jing1, WU Li-De1
1(School puter Science, Fudan University, Shanghai 200433, China)
2(Portfolio Strategy and Technology Leadership CTO Group, Alcatel-Lucent Shanghai Bell, Shanghai 200433, China)
+ Corresponding author: E-mail: ******@fudan.
Fei ZC, Zhou YQ, Huang XJ, Wu LD. Pronoun resolution in spoken dialog. Journal of Software, 2011,22(2):
233−244. /1000-9825/
Abstract: This paper presents a two-stage pronoun resolution algorithm. It does not need to clean the testing
corpus and predefine patterns manually. In the first stage of the algorithm, some new features and machine learning
methods are used to classify pronouns into anaphoric and non-anaphoric ones. In the second stage, these two kinds
of pronouns are resolved respectively. For the anaphoric ones, some methods are presented to extract distance,
syntactic, and semantic features etc. For the non-anaphoric ones, the Right Frontier Rule is improved to do the
resolution work. While testing the corpus published by Byron in 2004, this algorithm achieves a precision of %
and a recall of %. Compared with the work of Byron, the algorithm is fully automatic, and the results are much
better.
Key words: pronoun resolution; spoken dialog understanding; pronoun classification
摘要: 提出一套分为两步的代词指代消解算法, 1 步采用一些新
特征和机器学习算法对名词性指代代词和非名词性指代(non-anaphoric)代词分类,第 2 步分别对两类代词进行消解.
针对名词性代词指代消解,提出了适用于口语对话的特征抽取及表示方法,如代词和候选先行词的距离、语法、语
义等的抽取和表示方法,,将右边界规则(right frontier rule)
改进为可以在口语对话中自动抽取的形式, Byron 于 2004 年发布的语料上测试,消解正
确率达到 %,召回率达到 %.与 Byron 的工作相比,