文档介绍:Mining Time Series Data
Introduction, Motivation
Similarity Measures
Properties of distance measures
Preprocessing the data
Time warped measures
Indexing Time Series
Dimensionality reduction
Discrete Fourier Transform
Discrete Wavelet Transform
Singular Value position
Piecewise Linear Approximation
Symbolic Approximation
Piecewise Aggregate Approximation
Adaptive Piecewise Constant Approximation
Summary, Conclusions
Outline
What are Time Series?
0
50
100
150
200
250
300
350
400
450
500
23
24
25
26
27
28
29
..
..
A time series is a collection of observations made sequentially in time.
Note that virtually all similarity measurements, indexing and dimensionality reduction techniques discussed in this tutorial can be used with other data types.
Time Series are Ubiquitous! I
People measure things...
The presidents approval rating.
Their blood pressure.
The annual rainfall in Riverside.
The value of their Yahoo stock.
The number of web hits per second.
… and things change over time.
Thus time series occur in virtually every medical, scientific and businesses domain.
Time Series are Ubiquitous! II
A random sample of 4,000 graphics from 15 of the world’s newspapers published from 1974 to 1989 found that more than 75% of all graphics were time series (Tufte, 1983).
Defining the similarity between two time series is at the heart of most time series data mining applications/tasks
Thus time series similarity will be the primary focus of this tutorial.
10
s =
c =
Query Q
(template)
Time Series Similarity
Classification
Clustering
Rule Discovery
Query by Content
Why is Working With Time Series so Difficult? Part I
1 Hour of EKG data: 1 Gigabyte.
Typical Weblog: 5 Gigabytes per week.
Space Shuttle Database: 158 Giga