文档介绍:Data Mining: Introduction
Lots of data is being collected and warehoused
Web data, merce
purchases at department/grocery stores
Bank/Credit Card puters have e cheaper and more petitive Pressure is Strong
Provide better, customized services for an edge (. in Customer Relationship Management)
Why Mine Data? Commercial Viewpoint
Why Mine Data? Scientific Viewpoint
Data collected and stored at enormous speeds (GB/hour)
remote sensors on a satellite
telescopes scanning the skies
microarrays generating gene expression data
scientific simulations generating terabytes of data
Traditional techniques infeasible for raw data
Data mining may help scientists
in classifying and segmenting data
in Hypothesis Formation
Mining Large Data Sets - Motivation
There is often information “hidden” in the data that is not readily evident
Human analysts may take weeks to discover useful information
Much of the data is never analyzed at all
The Data Gap
Total new disk (TB) since 1995
Number of analysts
From: R. Grossman, C. Kamath, V. Kumar, “Data Mining for Scientific and Engineering Applications”
What is Data Mining?
Many Definitions
Non-trivial extraction of implicit, previously unknown and potentially useful information from data
Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns
What is (not) Data Mining?
What is Data Mining?
Certain names are more prevalent in certain US locations (O’Brien, O’Rurke, O’Reilly… in Boston area)
Group together similar documents returned by search engine according to their context (. Amazon rainforest, ,)
What is not Data Mining?
Look up phone number in phone directory
Query a Web search engine for information about “Amazon”
Draws ideas from machine learning/AI, pattern recognition, statistics, and database systems
Traditional Techniquesmay be unsuitable due to
Enormity of data
High dimensionality of data
Heterogeneous, distributed