文档介绍:上海交通大学
硕士学位论文
基于内容过滤的反垃圾邮件系统的设计与实现
姓名:程卫华
申请学位级别:硕士
专业:软件工程
指导教师:尤晋元;张鸿钧
20070401
基于内容过滤的反垃圾邮件系统的设计与实现
摘要
随着信息网络的高速发展,电子邮件作为一种快捷便利的通信
手段,已经深入普及到人们的日常工作与生活中。但与此同时,日益
泛滥的垃圾邮件也对全球造成了严重的威胁和不良影响,引起社会各
界的广泛关注。我们呼吁有关人士必须逐渐从立法、行政和规范角度
出发采取全面有效的措施,但目前主要依靠的还是反垃圾邮件技术。
为了降低系统的误判,更好地适应变化多端和类型丰富的垃圾邮
件,本文研究了基于内容过滤的反垃圾邮件技术,主要包括贝叶斯概
率统计、分布式校验值交换和启发式分析检测技术,进行一个反垃圾
邮件系统的设计和实现。在反垃圾邮件系统的研究中,通过提供和改
进 MTA 层过滤接口、MDA 层过滤接口和用户反馈机制,可以完善整个
邮件系统的防御体系,并支持个性化的反垃圾邮件控制功能。此外,
还根据实际部署需要分析和验证了两种不同的邮件系统架构,在优点
和局限方面作出了比较。
最后,本文研究了反垃圾邮件系统的评价方法和评价指标,利用
广泛收集的邮件样本和 K 次交叉验证法对该系统进行评测。评测实验
结果表明,该系统在应用改进的反垃圾邮件内容过滤技术方面取得了
良好的效果。
关键字:反垃圾邮件,内容过滤,贝叶斯算法,分布式校验值交换中
心,启发式分析检测
2
THE DESIGN AND IMPLEMENTATION OF
CONTENT-BASED ANTI-SPAM EMAIL SYSTEM
ABSTRACT
With the high speed of expansion of the work, e-mail
as a quick and convenient way municate, has seen ever widening
acceptance from people's daily life. But in the meanwhile, the tremendous
overspreading of the junk mails has imposed serious threat and negative
impact on the globe, which aroused the world's attention. Although an
effective and all around solution to it should involve collaborations from
legislation, administration and setting up better specifications, for now,
the most practical way remains to be anti-spam technology.
This paper aims at lowering the false-positive rate of the anti-spam
systems and to better adapt them to the abundant types and variations of
the junk mails. To achieve this, the research focuses on the content-based
filtering technology, including Bayesian Statistics, Distributed Checksum
Clearinghouse and heuristic analysis and detection. Through the adoption
of these method, an anti-spam system is constructed. During the
research, the filtering interface of MTA layer and MDA layer are
improved, the user feedback mechanism is facilitated, personalized
control is supported, all these contribute to leveraging the defense
capability of our