文档介绍:Microrray Data Standardisation
Microarray Gene Expression Database group -- MGED
December, 2000
Public data repositories for microarray data
There is a growing consensus in the life munity for a need for public repositories of gene expression data analogous to DDBJ/EMBL/GenBank for sequences
Some of the reasons:
Gradually building up gene expression profiles for anisms, tissues, cell types, developmental stages, various states, under influence of pounds
Through links to other genomics databases builds up systematic knowledge about gene functions parison of profiles, access and analysis of data by third parties
Cross validation of results and platforms - quality control
Systematic gene expression profiling initiatives in public domain
The International Life Science Institute (ILSI) is coordinating a program undertaken by ~25 pharmaceutical and panies to generate toxicity related gene expression data under defined experimental conditions
evaluate gene expression profiles in standardised test systems following exposure to toxicants
relate changes in gene expression to other measures of toxicity
Microarray data handling and analysis - a major bottleneck (Calculations by Jerry Lanfear)
Experiments:
100 000 genes in human
320 cell types
pounds
3 time points
2 concentrations
2 replicates
Data
8 x 1011 data-points
1 x 1015 = 1 petaB of data
Expression data repository projects
Public repositories in making:
GEO - NCBI
GeneX - NCGR
ArrayExpress - EBI
In-house databases - Stanford, MIT, University of Pennsylvania,
Organism specific databases: Mouse in Jackson
Proprietary databases - Gene Logic, NCI
Difficulties
Raw data are images
What is needed for higher level analysis and mining is gene expression matrix (genes/samples/gene expression levels)
lack of standard measurement units for gene expression
lack of standards for sample annoation
Raw data - images
Treated sample labeled red (Cy5)
Control data labeled green (Cy3)
Competitive hybridization onto chip
R