文档介绍:西安电子科技大学
硕士学位论文
网站实时时序数据采集系统的设计与实现
姓名:孙亚南
申请学位级别:硕士
专业:计算机技术
指导教师:姜建国;樊爱京
20100601
摘要
随着经济和技术的进步、互联网的普及和信息高速公路的发展,在社会的各个角
落,存在着大量的实时变化的数据。有些实时变化的数据与人们的生活密切相关,
如股票,外汇牌价等。这些数据信息虽然可以通过网站实时观察,但是数据本身无
法得到。本文针对这一问题设计了网站时序数据采集系统。
本文针对当前网站数据采集系统的种种不足,详细分析了网站数据采集系统的
需求,深入研究了网站数据分析与提取的方法,并在此基础上设计实现了网站实时
时序数据采集系统,解决了获取网页数据盲目性大及网页数据本身无法得到的问
题,实现了网址自动生成、用户定位数据、网页数据快速采集、数据查询及生成变化
曲线等重要功能。
本系统的重点在于建立通用的网页数据解析规则,做到能够对大部分网站的动态
数据进行采集。运用多线程技术解决了网页下载时程序界面不响应的问题,通过建
立配置文件解决了重启系统时需要重新设置的问题。程序统一字符编码为“utf8”。系
统界面力求简洁,易用。建立了菜单栏,整个界面只有一个按钮,所有设置项均通
过弹出式菜单实现。
程序是在 Linux 系统中的 Qt 上实现的 C++工程,是作者在 Linux 系统上编程的
第一次尝试,系统已经通过测试,效率比较高,工作较稳定,适用性较强。
关键词:实时数据数据采集源代码解析多线程
Abstract
With the economic and technological development, the popularity of the and
the development of the information highway, in every corner of society, there are a large
number of real-time data. Some real-time data is closely related to people's lives, such as
stocks, foreign exchange and so on. Although these data can be observed in real time
through the website, but the data itself cannot be acquired. In this paper, Design and
Implementation of work Real-time Data Gathering System is designed for the
problem.
For the poor performance of the work Data Gathering System,the author
has made a detailed requirements analysis of the systems, and in-depth study of the way of
the site data analysis and extraction. And on this basis, the real-time time-series data
acquisition system is designed and implemented. The paper solved the difficulty of getting
the changing number of pages. Finally, the author has fulfilled URL generated
automatically, user’s data location, rapid collection of Web data, data query and curves
generated and other important functions.
The focus of the procedure is to mon rules of data analysis , It can get
dynamic data of a majority of sites. Use the multi-threaded technology to resolve the
problem that system does not respond while th