1 / 31
文档名称:

Algorithms for Molecular Biology - Hidden Markov Models - Shamir.pdf

格式:pdf   页数:31
下载后只包含 1 个 PDF 格式的文档,没有任何的图纸或源代码,查看文件列表

如果您已付费下载过本站文档,您可以点这里二次下载

Algorithms for Molecular Biology - Hidden Markov Models - Shamir.pdf

上传人:kuo08091 2014/4/3 文件大小:0 KB

下载得到文件列表

Algorithms for Molecular Biology - Hidden Markov Models - Shamir.pdf

文档介绍

文档介绍:Algorithms for Molecular Biology Fall Semester, 2001
Lecture 5: December 13, 2001
Lecturer: Ron Shamir Scribe: Roi Yehoshua and Oren Danewitz 1
Hidden Markov Models
Preface: CpG islands
CpG is a pair of nucleotides C and G, appearing essively, in this order, along one DNA
strand. It is known that due to biochemical considerations CpG is relatively rare in most
DNA sequences [4]. However, in particular short subsequences, which are several hundreds of
nucleotides long, the couple CpG is more frequent. These subsequences, called CpG islands,
are known to appear in biologically more significant parts of the genome, such as around
the promoters or ’start’ regions of many genes. The ability to identify these CpG islands in
the DNA will therefore help us spot the more significant regions of interest along the genome.
We will consider two problems involving CpG islands : First, given a short genome se-
quence, decide if es from a CpG island or not. Second, given a long DNA sequence,
locate all the CpG islands in it.
Reminder: Markov chains
Definition A Markov chain is a triplet (Q, {p(x1 = s)},A), where:
• Q is a finite set of states. Each state corresponds to a symbol in the alphabet Σ.
• p is the initial state probabilities.
• A is the state transition probabilities, denoted by ast for each s, t ∈ Q.
For each s, t ∈ Q the transition probability is:
ast ≡ P (xi = t|xi−1 = s) ()
1This scribe is partially based on Ophir Gvirtzer’s and Zohar Ganon’s scribe from Fall Semester 2000.
2 Algorithms for Molecular Biology c Tel Aviv Univ.
Figure : Source: [4]. A Markov chain for modeling a DNA sequence. B and ε are the
begin and end states, respectively.
We assume that X =(x1,... ,xL) is a random process with a memory of length 1, .,
the value of the random variable xi depends only on its predecessor xi−1. Formally we can
write:
∀s1,... ,s ∈Σ P (x = s |x1 = s1,... ,x−1 = s −1)=
i i i i i ()
|
= P (xi = si xi−1