文档介绍:Temporal Probabilistic Concepts from Heterogeneous Data Sequences
Title & Authors
Sally McCleanBryan ScotneyFiona Palmer
School of Information & Software Engineering,
University of Ulster.
Gene Expression
Background
Scientists have now sequenced the entire human genome -approximately 30,000 genes.
Each of these genes when active results in the production of a protein -proteins have a variety of functions.
In order to understand the function of the genes, and the related proteins, scientists are interested in determining where and when the genes are active.
The steps involved in producing a protein from a gene.
Gene
(DNA)
RNA
Protein
Background
Gene Expression Results.
The DNA microarray is a microscope slide which enables scientists to determine the activity or expression of genes
Scientists place on each of the microarray spots an extract of the cells along with an extract from a reference sample .
The more RNA produced the more active the gene, (green for the sample and red for the reference).
Fluorescence of the spot is then measured to give the expression of the pared to the reference.
The Gene Expression Data Set
Background
The gene expression data set analysed describes the expression of 112 genes in the rat cervical spinal cord over 9 time points through the development of the rat from embryo to adult.
Only specific genes were analysed which are considered important in the development of the central nervous system in the rat.
E11
E13
E15
E18
E21
P0
P7
P14
A
Embryo:
Days since conception
PostNatal: Days since birth
Adult
The temporal nature of the gene expression data
ClusteringMutual Information
Clustering
Clustering is usually based on a distance metric - in this case mutual information.
Before clustering, the continuous gene expressions were discretised by partitioning the expression into 3 equal sized bins.
Gene
E11
E13
E15
E18
E21
P0
P7
P14
A
nAChRa2
0
0
0
1
2
2
2
2
1
mAChR2
0
0
0
2
2
2