1 / 31
文档名称:

数据挖掘datamining_intro.ppt

格式:ppt   大小:3,293KB   页数:31页
下载后只包含 1 个 PPT 格式的文档,没有任何的图纸或源代码,查看文件列表

如果您已付费下载过本站文档,您可以点这里二次下载

分享

预览

数据挖掘datamining_intro.ppt

上传人:mh900965 2017/6/26 文件大小:3.22 MB

下载得到文件列表

数据挖掘datamining_intro.ppt

相关文档

文档介绍

文档介绍:Data Mining: Introduction
Lots of data is being collected and warehoused
Web data, merce
purchases at department/ grocery stores
Bank/Credit Card puters have e cheaper and more petitive Pressure is Strong
Provide better, customized services for an edge (. in Customer Relationship Management)
Why Mine Data? Commercial Viewpoint
Why Mine Data? Scientific Viewpoint
Data collected and stored at enormous speeds (GB/hour)
remote sensors on a satellite
telescopes scanning the skies
microarrays generating gene expression data
scientific simulations generating terabytes of data
Traditional techniques infeasible for raw data
Data mining may help scientists
in classifying and segmenting data
in Hypothesis Formation
Mining Large Data Sets - Motivation
There is often information “hidden” in the data that is not readily evident
Human analysts may take weeks to discover useful information
Much of the data is never analyzed at all
The Data Gap
Total new disk (TB) since 1995
Number of analysts
From: R. Grossman, C. Kamath, V. Kumar, “Data Mining for Scientific and Engineering Applications”
What is Data Mining?
Many Definitions
Non-trivial extraction of implicit, previously unknown and potentially useful information from data
Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns
What is (not) Data Mining?
What is Data Mining?

Certain names are more prevalent in certain US locations (O’Brien, O’Rurke, O’Reilly… in Boston area)
Group together similar documents returned by search engine according to their context (. Amazon rainforest, ,)
What is not Data Mining?
Look up phone number in phone directory

Query a Web search engine for information about “Amazon”
Draws ideas from machine learning/AI, pattern recognition, statistics, and database systems
Traditional Techniques may be unsuitable due to
Enormity of data
High dimensionality of data
Heterogeneous, distributed