1 / 23
文档名称:

An Overview of Clustering Methods.ppt

格式:ppt   页数:23
下载后只包含 1 个 PPT 格式的文档,没有任何的图纸或源代码,查看文件列表

如果您已付费下载过本站文档,您可以点这里二次下载

An Overview of Clustering Methods.ppt

上传人:中国课件站 2011/12/4 文件大小:0 KB

下载得到文件列表

An Overview of Clustering Methods.ppt

文档介绍

文档介绍:An Overview of Clustering Methods
With Applications to Bioinformatics
Sorin Istrail
Informatics Research
Celera Genomics
What is Clustering?
Given a collection of objects, put objects into groups based on similarity.
Used for “discovery-based” science, to find unexpected patterns in data.
Also called “unsupervised learning” or “data mining”
Inherently an ill-defined problem
Do we put Collins with Venter because they’re both biologists, or do we put Collins with Lander because they both work for the HGP?
Biologist
Mathemat-ician
Celera
HGP
Data Representations for Clustering
Input data to algorithm is usually a vector (also called a “tuple” or “record”)
Types of data
Numerical
Categorical
Boolean
Example: Clinical Sample Data
Age (numerical)
Weight (numerical)
Gender (categorical)
Diseased? (boolean)
Must also include a method puting similarity of or distance between vectors
Calculating Distance
Distance is the most natural method for numerical data
Lower values indicate more similarity
Distance metrics
Euclidean distance
Manhattan distance
Etc.
Does not generalize well to non-numerical data
What is the distance between “male” and “female”?
Calculating Numerical Similarity
Traditionally over the range [, ]
= no similarity, = identity
Converting distance to similarity
Distance and similarity are two sides of the same coin
To obtain similarity from distance, take the maximum pairwise distance and subtract from
Pearson correlation
Removes magnitude effects
In range [-, ]
- = anti-correlated, = no correlation, = perfectly correlated
In the example below, the red and blue lines have high correlation, even though the distance between the lines is significant
Calculating Boolean Similarity
Given two boolean vectors X and Y, let A be the number of places where both are 1, etc. as shown below.
Two standard methods for similarity given at right
Can be generalized to handle categorical data as well.
Boolean Similarity

最近更新

2024年汕尾职业技术学院单招职业倾向性考试模.. 38页

2024年江苏农牧科技职业学院单招职业技能测试.. 41页

2024年江苏安全技术职业学院单招职业适应性考.. 40页

2024年江苏省南通市单招职业适应性测试题库最.. 39页

2024年江苏航空职业技术学院单招职业技能考试.. 39页

2024年江西信息应用职业技术学院单招职业倾向.. 41页

2024年江西应用科技学院单招职业适应性测试题.. 39页

2024年江阴职业技术学院单招职业技能测试题库.. 43页

2024年沧州职业技术学院单招职业适应性考试模.. 41页

2024年河北女子职业技术学院单招综合素质考试.. 40页

2024年河北机电职业技术学院单招职业技能考试.. 41页

2024年河北省邯郸市单招职业倾向性测试模拟测.. 39页

2026年阳泉职业技术学院单招职业适应性测试模.. 41页

2024年河南省洛阳市单招职业倾向性测试题库最.. 39页

2024年河南职业技术学院单招职业倾向性考试题.. 41页

2024年泉州幼儿师范高等专科学校单招职业适应.. 40页

2024年泰州职业技术学院单招综合素质考试模拟.. 39页

2024年济南工程职业技术学院单招职业技能考试.. 41页

2024年浙江安防职业技术学院单招综合素质考试.. 38页

2024年浙江广厦建设职业技术大学单招职业倾向.. 41页

2024年浙江海洋大学单招职业技能测试题库推荐.. 40页

2024年浙江省绍兴市单招职业适应性测试题库汇.. 40页

2024年浙江舟山群岛新区旅游与健康职业学院单.. 41页

2024年浙江金华科贸职业技术学院单招职业倾向.. 39页

2024年海南工商职业学院单招职业技能测试模拟.. 38页

2024年淮南职业技术学院单招综合素质考试模拟.. 40页

2024年温州医科大学仁济学院单招职业技能测试.. 40页

2024年湄洲湾职业技术学院单招职业技能测试题.. 40页

2024年湖北省恩施土家族苗族自治州单招职业适.. 42页

2024年湖北艺术职业学院单招职业技能测试模拟.. 40页