文档介绍:唐代以来汉语文学作品中的字频演变收稿日期:2010—定稿日期:
基金项目:北京师范大学青年教师科研基金
作者简介:刘宇凡(19??一),女,讲师,主要研究方向为自然语言处理;郭金忠(1985一),男,硕士研究生,主要研究方向为复杂性理论及其应用;陈清华(1976一),男,讲师,主要研究方向为复杂性理论及其应用,******@bnu.
刘宇凡1,郭金忠2,陈清华2
(,石家庄,050031;
,北京,100086)
摘要:研究历史上各个时期中文文学作品中的字频分布具有重要意义,可以帮助我们更加深入研究汉语言的历史演变,但这在以前的语言统计工作中是缺乏的。本文对唐代以来的文学作品按不同时期进行分类建立语料库,字频分析的结果表明自唐代以来人们使用汉字的习惯处于不断变化之中,时期越相近,汉字的使用习惯就更具一致性。从分布上看,不同时期的字频都可以用一个指数截断的幂律函数进行很好的拟合,随着历史的发展,幂律性质不断衰减而指数性质不断增强。
关键词:汉语文学作品;字频分布;指数截断的幂律
中图分类号:H087, TP391 文献标识码:A
The Evolution of Character Using Frequency in Chinese Literature since the Tang Dynasty
LIU Yufan1, GUO Jinzhong2, CHEN Qinghua2
(1. School of Humanities and Social Sciences, Shijiazhuang University of Economics, Shijiazhuang, 050031;
2. School of Management, Beijing Normal University, Beijing,100875)
Abstract: It is meaningful to study character frequency distribution among Chinese literatures from different periods, because it could help us to know more about how Chinese language evolves over time. This paper has presented that the character frequency distribution has been changing since Tang Dynasty, by counting the character frequencies of 5 classical as well as modern Chinese literatures. It is clear that the two character frequency distributions are more similar when the times periods that they came from are closer, and all the distributions could be well fitted