文档介绍:基于双数组和 PAT 树算法的动态词典机制
姜鹏黄德根
(大连理工大学, 辽宁-大连 116023)
摘要:现有分词词典在使用过程中会遇到未登陆词问题。为了能够正确识别未登录词,有必
要将这些信息添加到词典中去,而现有的分词词典对词的添加均存在一定的不足。对于静态
词典索引,当词条数目庞大时,添加新词重新生成索引会耗费大量的时间。对于动态的词典,
添加词条不会耗费大量时间,但是每次使用该词典时都必须重新生成动态索引,词条越多耗
费时间越大。本文利用了一个动态索引算法和一个静态索引算法来构成一个词典,可以满足
词条的动态添加、删除和查找工作。
关键词:Double-Array;PAT 树;分词词典;动态添加;
A dynamic dictionary mechanism of
Double-array-based and PAT tree algorithm
JiangPeng
(DaLian university of science and technology, LiaoNing-DaLian 116023)
Abstract: The existing participle dictionary can meet not registers the word question in the using
process. In order to correctly distinguish not registers the word, it is necessary to increase these
information to the dictionary, but the existing participle dictionary all has the certain insufficiency
to the word increase. Regarding static dictionary index, when the entry number is huge, increases
the new word will re-produces the index that will cost the massive time. Regarding dynamic
dictionary, the increase entry cannot cost the massive time, but each time when uses this
dictionary must re-produce the dynamic index, the entry more costs the time is bigger. This article
used one dynamic index algorithm and one static index algorithm constitutes one dictionary,
which was allowed to satisfy the entry the dynamic increase, the deletion and the search work.
Key word: Double-Array