文档介绍:Case-based reasoning driven gene annotation
G. Christian Overton and Juergen Haas
Center for Bioinformatics, Univ. of Pennsylvania
1. Introduction
Human Genome Project
Flood of information on ic and physical maps of chromosomes.
Automatic data management
Under going research
“Data mining”, “knowledge discovery”
Biological data
Extremely abundant
Relatively sparse
Not generally suitable for statistical method
1. Introduction (cont’d)
Case-based reasoning (CBR)
Closely model the approach often taken by biologists.
Can be viewed as a formalization of one form of reasoning about biological systems.
From a DB perspective
Provides a convenient framework for exploring issues on maintaining automatically an accurate and consistent view of data.
From biology perspective, CBR is a appropriate framework
For making heuristic predictions over sparse features.
For building posite view of our current state knowledge.
2. Case-based reasoning
CBR in general meaning (Shank [2] in 1980’s)
Broad steps
(1) Build a case-base of known, characterized and indexed case instances
(2) Retrieve a (set of) case(s) from the case-base which is similar to the query case
(3) Adapt information from the known case(s) and apply to the query case.
(4) Evaluate the solution proposed in the adaptation step and if necessary repeat the adaptation step by, for example, selecting additional cases or repairing the solution proposed in the previous round of adaptation.
(5) Store the new solution in the case-base
case-base
query case (new situation)
source cases
(well-characterized instances)
3. CBR in biology
Similarity by descent (brain mon to all vertebrates)
Techniques developed by munity
well suited for reasoning about biological systems
Advantages over generalization-based methods
data too sparse to drive statistical analysis
no sound biological model
more efficient and reliable
Examples
CBR (closely related techniques of memory-based reasoning) applied to predicting secondary structure in glo