文档名称：

首都师范大学计算机科学与技术专业学士学位论文.docx

格式：docx 大小：308KB 页数：39页

下载后只包含 1 个 DOCX 格式的文档，没有任何的图纸或源代码，查看文件列表

如果您已付费下载过本站文档，您可以点这里二次下载

预览

下载此文档

首都师范大学计算机科学与技术专业学士学位论文.docx

上传人:xiaodengyou 2018/5/2 文件大小：308 KB

下载得到文件列表

首都师范大学计算机科学与技术专业学士学位论文.docx

相关文档

文档介绍

文档介绍：基于Web的文本分类挖掘的研究
中文提要

互联网现在已经成为一个巨大的信息源,如何让互联网信息更好地为人类服务,如何快速、准确获取所需信息,是我们面临的一个重要课题。因此,基于Web的网络信息处理成了当前的研究热点,其中,Web上的文本分类方法的研究是网络数据挖掘的研究重点之一。
本文介绍了数据挖掘,Web挖掘和文本分类的理论,对Web数据的特点作了分析,比较了HTML与传统数据的区别,分析了文本分类的几种算法,重点研究了朴素贝叶斯分类算法和算法改进的具体过程。尝试利用HTML标记权重来改善朴素贝叶斯算法的条件独立假设的不足。简述了现有的对网页的标记过滤的知识,并利用标记中的有用信息结合文本分类算法进行文本分类。最后,针对改进的分类器的在精确率上不太理想的特点,对本课题下一步要研究的内容进行了总结,并提出了自己的一些看法。
关键词
Web挖掘朴素贝叶斯数据挖掘文本分类网页标记
Research of Text Classification Mining based on WEB
ABSTRACT
has e a great information source. It is an important issues for us to confront that how to make the information serve people better and how to obtain the information quickly and accurately. Nowadays the Research of information processing based on web is a hotspot. The text categorization of web has became more important than the other research of web mining.
The theoretical development of data mining, Web mining and text classification are introduced, analyzes the feature of Web pares with the other datanaive bayes classifier . Analyzes some arithmetics of text categorization and the concrete process of the improvement of arithmetic in naive bayes classifier are put emphasis on. This thesis tries to make use of HTML tags to improve the arithmetic of naive bayes classifier whose bug is its hypothesis. In the practice of the classifier ,the thesis summarizes the method which can leach HTML tags,then tries to use the information from the tags and the text categorization arithmetic to classify the text.
Finally, the precision of the classifier which has been improved is not ideal, so the next contentsof this subject are summarized and some one's own views are also presented.
Xu Ying
Directed by Liu Li-zhen
Key word
WebMining Naïve Bayes Data Mining Text categorization HTML tags
目录
中文提要 1
外文提要错误!未定义书签。
第一章绪论 4
选题背景及意义 4
数据挖掘 4
Web挖掘 5
Web挖掘的研究现状与发展 8
本文的主要研究内容与组织结构 9
第二章基于Web的文本分类挖掘 9
引言 9
Web文本的预处理 10
Web文本数据采集 10
文本分词 10
文本特征库 11
文本