文档介绍:Discrimination and clustering with microarray gene expression data
Terry Speed, Jane Fridlyand, Yee Hwa Yang and Sandrine Dudoit*
Department of Statistics, UC Berkeley,
*Department of Biochemistry, Stanford University
ENAR, Charlotte NC, March 27 2001
Outline
ments
Classification
Clustering
A synthesis
Concluding remarks
Tumor classification
A reliable and precise classification of tumors is essential for essful treatment of cancer.
Current methods for classifying human malignancies rely on a variety of morphological, clinical and molecular variables.
In spite of recent progress, there are still uncertainties in diagnosis. Also, it is likely that the existing classes are heterogeneous.
DNA microarrays may be used to characterize the molecular variations among tumors by monitoring gene expression on a genomic scale.
Tumor classification, ctd
There are three main types of statistical problems associated with tumor classification:
1. The identification of new/unknown tumor classes using gene expression profiles;
2. The classification of malignancies into known classes;
3. The identification of “marker” genes that characterize the different tumor classes.
These issues are relevant to other questions we meet , . characterising/classifying neurons or the toxicity of chemicals administered to cells or model animals.
Gene Expression Data
Gene expression data on p genes for n samples
Genes
mRNA samples
Gene expression level of gene i in mRNA sample j
=
Log( Red intensity / Green intensity)
Log(Avg. PM - Avg. MM)
sample1 sample2 sample3 sample4 sample5 …
1 ...
2 - ...
3 ...
4 - - - - - ...
5 - - ...
Comparison of discrimination methods
In this field many people are inventing new methods of classification or using plex ones (. SVMs). Is this necessary?
We did a paring several methods on three publicly available tumor