文档介绍:GGCGATGGATTTTGTGGTGGTGGCGAAAAAAGGCGTGGCGGATCTGG
GAAGTGCTGGG
GGCGTATGATTTTGTGTTTGTGGCGAAAAACGGCATTGGCAAACTGGATAACA
TGGATC
GGAACTGCATCAGCATTTTGGCAAAC
GTA
TTTCGTCTGCTGCAGCATCGTCTGATTAGCATGGATTTTGTGGTGATTGCGAAAAAAAACATTGTGTATCTGAACAACAAAAAAATTGTGAAC
GTCTGGGCGTGGTGGCGAGCAAACGTAA
AGCTGGAAAAACAGAGCAAACGTAG
GTGCGGGCTTTGTGGTGAGCAAAGCGGTGGGC
GTGCGGGCTTTGTGGTGAGCAAAGCGGTGGGCGTGGCGGTGGTGCGT
GCGTGATCGTCGT
GTGTGAAACGT
GGCGGCGGCGGAA
GGCGGCGAGCGAACATGCGATGCTGAGC
TGCGTGCGGTGCAGAGCGATGTGATTATTCATGTGTGGCGTG
GTGGGCAGCGCGGTGGAACGTCATCGTGTGGCGCGTCGTC
GTAACGTGAGCAGCGCGT
Genome Annotation
David Coates
Genome Annotation
The Drosophila Story
Introduction
sequence published and annotated over several years
test case for ‘random’ sequencing approach - 4 months, then 2 months annotation….
Comparing genomes
Ce 19,105 protein sequences
100 Mbp
Dm ca. 13,600 protein sequences
180 Mbp total, 120 Mbp euchromatin
Hs ? 70,000 genes
The key steps
Sequence the genome (!)
Assemble it
Predict the genes
Assess the likelyhood
Annotate as appropriate….
Drosophila melanogaster
Sequence
Random sequencing approach
Small fragments cloned and sequenced to x4, x6, x10, x12 coverage
Contigs generated puter analysis for overlaps
BAC and YAC ends sequenced
Drosophila melanogaster
Assembly
Scaffold generated with BAC end and STS data
Sequenced fragments built into contigs onto scaffold
Some 1000 gaps still exist, most the size of a P element. Still being filled in.
Jamboree
Week 1 (Nov. 1999) Annotators work on first pass data
Week 2 (Nov. 1999) Specialist annotators move in
new analyses run
learn how to look at data
Week 3 (Jan. 2000) dataset stable
Drosophila melanogaster
Gene prediction
Genefinder
Genie
Anything found by both is high priority
Anything found by one or the other is lower priority
Genie was rated as more accurate
Drosophila melanogaster
Annotate
All ORF/Gene predictions are tested by Blast against
EST databases
Protein databases
for