文档介绍:Gene Ontology Driven Classification of Gene Expression Patterns
Claudio Lottaz and Rainer Spang
Computational Diagnostics putational Molecular Biology DepartmentMax Planck Institute for Molecular ics
11/11/2017
1
Modelling Untranslated Regions
Overview
Introduction
Gene Ontology driven classification of gene expression patterns
Preliminary evaluation on leukemia
Limitations and future work
Conclusions
11/11/2017
2
Overview
Problem Statement
Classify gene expression patterns into classes with biological meaning
Typical training data for supervised learning:
Many genes
Few annotated samples
Typical difficulties:
Overfitting
Lack of intuitive rationale for classifications
11/11/2017
3
Introduction
Logistic Regression
Method:
Generalised linear statistical model
Determine weights for each input variable
Signoidal function to map classifier on interval [0, 1]
So far we only consider the binary case
Limitations
Only works with few input variables
Troubled by colinear variables
No biological knowledge
11/11/2017
4
Introduction
Gene Ontology
Structure knowledge about genes
Directed acyclic graph
Represents knowledge on
Molecular function
Bilogical process
ponent
Genes are annotated to nodes in the graph
GO:0003673Gene Ontology
GO:0003674molecular function
GO:0008150biological process
GO:0005575ponent
...
...
...
11/11/2017
5
Introduction
One Classifier per GO-Node
One GO node has
Identifier, name, description
Children (other GO nodes)
Probe-set annotations
One logistic regression per node
Same classification task in each node
Smaller sets of input variables (directly annotated genes and direct children)
11/11/2017
6
GO driven gene expression classification
Bottom-up Information Collection
Start with leaf-nodes
Use results of these to train their parents
Post-order traversal of the directed graph from its root
GO:0004386helicase
GO:0003876DNA helicase
GO:0008026ATP dependent
helicase
GO:0003724RNA helicase
GO:000