1 / 15
文档名称:

基于SVM的web分类方案设计与研究.doc

格式:doc   页数:15
下载后只包含 1 个 DOC 格式的文档,没有任何的图纸或源代码,查看文件列表

如果您已付费下载过本站文档,您可以点这里二次下载

分享

预览

基于SVM的web分类方案设计与研究.doc

上传人:2028423509 2014/3/19 文件大小:0 KB

下载得到文件列表

基于SVM的web分类方案设计与研究.doc

文档介绍

文档介绍:
基于 SVM 的 web 分类方案设计与研究
叶灵,辛阳
**
(北京邮电大学信息安全中心,北京 100876)
5
摘要:随着网络科技的飞速发展,网络已经成为人们日常生活的必需品。数以亿计的网页已
经在带给人们方便的同时也在困扰着人们,如何高效、快速、准确的选择理想的网页以及如
何过滤垃圾网页已经成为亟待解决的问题。而文本是最常用的载体之一,如何处理文本信息
10
15
20
已经成为信息检索和获取等领域的热门研究方向。起初的文本分类是依靠人工判断的,随着
网页数量的高速增长,大数据的环境下依靠人工分类是不现实的,这就使得自动文本分类成
为被研究的新领域,且逐步与搜索引擎、信息过滤等领域紧密结合,成为文本信息的重要手
段。本文通过比较当前俩种最流行算法——KNN 算法、SVM 算法劣性,结合样本大小、特
征等因素考虑,最终选择 SVM 分类算法作为实验算法。对一个分类方案实现了总体设计
和各个模块设计,实现了一个分类系统,并对此分类系统实现了性能分析。
关键词:信息安全;文本分类;SVM 算法;KNN 算法;特征选择
中图分类号:TP319
DESIGN AND RESEARCH OF THE WEB
CLASSIFICATION SCHEME BASED ON SVM
Ye Ling , Xin Yang
(Information Security Center, Beijing University of Posts and munications, Beijing
100876 )
25
Abstract: With the rapid development work technology, work has e an
everyday necessity. Hundreds of millions of pages have to bring people in handy while also
troubled people, how to choose the ideal web pages efficiently and accurately and fast has e
a serious to deal with text information has e a hot research direction in the
30
35
fields of information search and acquisition, with text has e the most popular
carrier .Initially text classification is to rely on human judgment,however,with the rapid growth in
the number of pages, just relying on manual classification is not realistic on large data
environment, it makes the automatic text classification has been e one of the research in
new areas, and gradually e an important means of textual information integrated search
engine, information filteringclosely. paring the current two most popular algorithms
advantages and disadvantages - KNN algorithm, SVM algorithm, considering by the sample size,
characteristics and other factors, and ultimately choosing SVM classification algorithm as an
experimental a classification scheme to achieve the overall design and each module
design, implementation of a classification system, and to achieve a performance analysis.
40
Key words: Information security; Tex