文档介绍:Abstract
Speech is the most fundamental attributes of vocabulary, it does not only provide the corresponding knowledge base on syntax, grammar analysis, but also provide the decision information for the benefit of such as part of speech tagging and other natural language tasks. Part of speech tagging is the part of speech of words tagging process, it is extensive researching as a basic task in the Natural Language Processing field.
POS tagging results directly affects the subsequent task of the Natural language accuracy, Now the method of the promoting of POS tagging method is used by statistical model and rule- based method. Based on the statistical model for part of speech tagging, which is the monly used is a hidden Markov model (Hidden Markov Model, HMM). Because of some unique grammatical characteristics of Chinese linguistics, In the process of part of speech tagging based on HMM often appears many problems of sparse data, including model homonyms ambiguity and unlisted words etc.. In the process of continuous researching on HMM, many scholars have proposed annotation method using work, rule base and finite state machine bined with the traditional HMM and evolve into a new method for part of speech, also pointed out the new direction for the improvement of part of speech tagging effect.
Based on the study of Chinese part of speech tagging characteristics, in order to improve the accuracy of annotation , the optimization of Chinese part of speech tagging is improved . First of all, after the study of the statistical model and the traditional work for the Chinese part of speech tagging process, based on the analysis the field characteristics of the work and the traditional Hidden Markov Model in the part of speech tagging, constitute a BP-HMM with their advantages. The new model can better integrate the context information, so as to improve the accuracy of part of speech tagging. Secondly, because of the traditional smoothing algorithm can not meet the needs