文档介绍:
基于条件随机场的中国学生英语作文词性
标注
吴坤,谭咏梅**
5
10
15
20
25
30
35
40
(北京邮电大学计算机学院,北京 100876)
摘要:词性标注是自然语言处理领域的一项重要研究课题,几乎所有自然语言处理的应用中都要使用到词
性标注,而词性标注更是英语作文评改系统的基本组成部分。应用机器学习技术来对英语文章进行词性标
注需要适当的标注语料库。本文对中国学生的英语作文进行人工标注,并提出了一种面向中国学生英语作
文的词性标注方法,该方法通过对大量无标语料中的单词进行无监督的词聚类来提高词性标注的准确率,
并在标注好的语料上进行了实验,实验结果表明该方法能够有效的提高标注准确率。
关键词:词性标注; 学生英语作文; 特征; 词聚类
中图分类号:TP391
Part-of-Speech tagging for Chinese English learner language
based on CRF
Wu Kun, Tan Yongmei
(Computer School,Beijing University of Posts and munications,Beijing 100876)
Abstract: Part of Speech(POS) tagging is an ponent for almost all Natural Language
Processing(NLP) application areas, and pos tagging is a ponent in automated essay scoring system.
Applying machine-learning techniques to the puterized languages require development of appropriately
tagged corpus. In this paper, we annotated essays of Chinese English learner manually, and we proposed a method
of Part-of-Speech tagging for Chinese English learner language, this bined unsupervised word cluster
with basic features to POS Tagger, and experiment were carried out on the manually annotated corpus,
experimental results provided our method was prominent.
Key words: Part-of-Speech tagging; essays of learner; feature; word clustering
0 引言
随着英语学习者的数量急剧增加,对英语学习者的文章进行相关分析研究显得日趋重
要。词性标注,即在给定句子中判断每个词的语法范畴,确定词性的过程[1]。词性标注是自
然语言处理中一项最重要的预处理任务,是后续的组块分析,句法分析与语