文档介绍:Text and Data Mining for Bioinformatics
Hinrich Schütze, Novation Biosciences
Russ Altman, Stanford University
Part I (Russ Altman)
Text and Data Mining: A key technology in the biological information revolution
Mining of microarray/genechip data
Integrating textual and non-textual data for protein sequence analysis
Part II (Hinrich Schuetze)
Association and clustering
Information extraction
Structure mining
Resources
Software, Data, Publications
Status
This file contains raw material for the tutorial.
It will be significantly expanded and reworked if the tutorial is accepted.
Only the text mining part of the tutorial is covered in this file.
Cost effectiveness
Utility
Artificial Intelligence
Cyc
Information Extraction
Fastus
Primary Literature
Reading
Keyword-based Retrieval
PubMed
Structure Mining
Low
Hi
Low
Hi
Manual Knowledge Representation
Riboweb
Text Mining Technologies
Problems Addressed by NLP
Information overload
Primary literature
New data-intensive methods
Manual curation is slow, costly, inflexible
Progress is hampered by fragmentation of research
Natural Language Processing: Applications
Data analysis
Data integration
Better information retrieval
Semi-automated curation
Structure mining
Association and clustering
Information Extraction
Application: Drug discovery
Steps in finding a drug
Disease
Protein
Gene
All variants of the gene
All genes in a pathway with this gene
Annotation can reduce set of targets from 10,000s to 100s