文档介绍:语料库标记与标注
李文中
2012
推荐阅读:
《语料库应用教程》“元信息标注”,-44
定义:标记(Markup)
为语料库记录并添加外部信息
文本即数据(text as data)
元数据(meta data)
元元数据(meta meta-data)
标记语言:XML
成对的标签
开始标签:<>
关闭标签:</>
定义: 标注
Corpus annotation is the process of adding information to a corpus(语料库标注即为语料库添加信息的过程)(Hunston, 2002:79)
This information is designed to interpret the corpus linguistically (该信息用于对语料库的语言学解释)(Leech, 1997:2)
The term ‘annotation’ is used to cover tagging, parsing and other forms of annotation (语料库标注包括赋码、句法分析、及其他形式的标注)
Hands-on practice
Metadata encoder:
file:\\Tools\02标注工具\Metadata_Encoder
Tree tagger:
file:\\2012workshop\Tools\02标注工具\treetagger
Powergrep
Tag retrieval using Regex
retrieve data
<author></author>
words and their POS
(Adj)+ N
Thanks for attention!
Dr. Li Wenzhong
@
中国外语教育研究中心