1 / 54
文档名称:

The Subspace Clustering Problem.ppt

格式:ppt   页数:54
下载后只包含 1 个 PPT 格式的文档,没有任何的图纸或源代码,查看文件列表

如果您已付费下载过本站文档,您可以点这里二次下载

The Subspace Clustering Problem.ppt

上传人:中国课件站 2011/12/4 文件大小:0 KB

下载得到文件列表

The Subspace Clustering Problem.ppt

文档介绍

文档介绍:DB Seminar Series: The Subspace Clustering Problem
By: Kevin Yip
(17 May 2002)
Presentation Outline
Problem definition
Different approaches
Focus: the projective clustering approach
Problem Definition – Traditional Clustering
Traditional clustering problem: To divide data points into disjoint groups such that the value of an objective function is optimized.
Objective function: to minimize intra-cluster distance and maximize inter-cluster distance.
Distance function: define over all dimensions, numeric or categorical.
Problem Definition – Traditional Clustering
Example Problem: clustering points in 2-D space. Distance function: Euclidean distance (d: no. of dimensions, 2 in this case).
Problem Definition – Traditional Clustering
Example (source: CURE, SIGMOD 1998)
Problem Definition – Distance Function Problem
Observation: distance measures defined over all dimensions are sometimes inappropriate.
Example (source: DOC, SIGMOD 2002)
C1: (x1, x2)
C2: (x2, x3)
C3: (x1, x3)
Problem Definition – Distance Function Problem
As the number of noise dimensions increases, the distance functions e less and less accurate.
=> For each cluster, except the set of data points, we also need to find out the set of “related dimensions”(“bounded attributes”)
Problem Definition – The Subspace Clustering Problem
Formal Definition: Given a dataset of N data points and d dimensions, we want to divide the points into k disjoint clusters, each relating to a subset of dimensions, such that an objective function is optimized.
Objective function: usually intra-cluster distance, each cluster uses its own set of dimensions in distance calculation.
Problem Definition – The Subspace Clustering Problem
Observation: normal distance functions (Manhattan, Euclidean, etc.) give a smaller value if less dimensions are involved.
=> 1. Use a normalized distance function. => 2. Should also try to maximize the number of dimensions.
Example (DOC): score(C, D) = |C|(1/β)|D|, C = points in a c

最近更新

2025年济源职业技术学院单招职业技能考试模拟.. 42页

2026年兰州石化职业技术学院单招职业适应性考.. 41页

2026年兰州资源环境职业技术大学单招职业技能.. 41页

2026年冀中职业学院单招职业适应性考试模拟测.. 42页

2025年浙江农林大学单招职业适应性测试题库新.. 41页

2026年内蒙古民族幼儿师范高等专科学校单招职.. 42页

2026年内蒙古赤峰市单招职业适应性测试题库必.. 42页

2025年浙江工贸职业技术学院单招综合素质考试.. 41页

2025年浙江师范大学单招职业倾向性考试模拟测.. 40页

2025年浙江建设职业技术学院单招职业技能测试.. 40页

2026年单招二职业测试题及答案1套 41页

2025年浙江汽车职业技术学院单招综合素质考试.. 39页

2026年单招动漫专业试题及答案1套 41页

2025年浙江理工大学单招综合素质考试题库汇编.. 40页

2026年单招机考测试题及答案1套 41页

2026年单招汽修测试题必考题 41页

2025年浙江药科职业大学单招职业倾向性测试题.. 40页

2025年浙江警官职业学院单招职业技能测试模拟.. 41页

2026年单招素质测试试题附答案 41页

2025年浙江邮电职业技术学院单招职业技能测试.. 41页

2025年浙江金融职业学院单招综合素质考试模拟.. 40页

2026年单招视觉传达测试题及答案1套 42页

2026年单招足球测试题目及答案1套 42页

2025年海南外国语职业学院单招职业适应性测试.. 41页

2025年海南经贸职业技术学院单招职业倾向性测.. 41页

2025年海口经济学院单招综合素质考试模拟测试.. 40页

2025年淮北职业技术学院单招职业技能测试模拟.. 41页

2025年淮南职业技术学院单招综合素质考试题库.. 39页

2025年淮南联合大学单招职业适应性测试模拟测.. 40页

2025年重庆市《保安员证》考试题库含答案 39页