文档介绍:Data Mining in Bioinformatics
Outline
Introduction
Overview of Microarray Problem
Image Analysis
Data Mining
Validation
Summary
2
Introduction: mended Literature
1. Bioinformatics – The Machine Learning Approach by P. Baldi & S. Brunak, 2nd edition, The MIT Press, 2001
2. Data Mining – Concepts and Techniques by J. Han & M. Kamber, Morgan Kaufmann Publishers, 2001
3. Pattern Classification by R. Duda, P. Hart and D. Stork, 2nd edition, John Wiley & Sons, 2001
3
Introduction: Microarray Problem in Bioinformatics Domain
Problems in Bioinformatics Domain
Data production at the levels of molecules, cells, organs, organisms, populations
Integration of structure and function data, gene expression data, pathway data, phenotypic and clinical data, …
Prediction of Molecular Function and putational biology: synthesis (simulations) and analysis (machine learning)
4
Microarray Problem: Major Objective
Major Objective: Discover prehensive theory of life’anization at the molecular level
The major actors of molecular biology: the nucleic acids, DeoxyriboNucleic acid (DNA) and RiboNucleic Acids (RNA)
The central dogma of molecular biology
Proteins are plicated molecules with 20 different amino acids.
5
Input and Output of Microarray Data Analysis
Input: Laser image scans (data) and underlying experiment hypotheses or experiment designs (prior knowledge)
Output:
Conclusions about the input hypotheses or knowledge about statistical behavior of measurements
The theory of biological systems learnt automatically from data (machine learning perspective)
Model fitting, Inference process
6
Overview of Microarray Problem
Data Mining
Microarray Experiment
Image Analysis
Biology Application Domain
Experiment
Design and
Hypothesis
Data Analysis
Artificial Intelligence (AI)
Knowledge discovery in databases (KDD)
Data Warehouse
Validation
7
Artificial Intelligence (AI) Community
Issues:
Prior knowledge (., invariance)
Model deviation from true model
Sampling