文档介绍:[ Team LiB ]
BLASTN Protocols
As we said earlier, most searches can be categorized as either mapping or exploring searches. When sequences are expected to be nearly identical, you should use the +1/-3 match-mismatch parameters, which have a target frequency of 99 percent identity. Cross-species exploration requires a change in the scoring parameters and word size. We like +1/-1 for both its simplicity and its 75 percent identity target frequency. The choice of word size depends on balancing sensitivity and specificity. The default word size of 11 is too risky; use 9, which corresponds to a little more stringency than three identical amino acids because there's no allowance for degenerate codons. The choice of gap costs depends on the size of the expected gap. For simulating sequencing errors, the gap costs should be uniform and relatively high, but for modeling amino acid gaps or nucleotide hybridization bubbles, the cost of extension should be lower.
Mapping Oligos to a Genome
Many kinds of experiments, both molecular putational, employ short nucleotide sequences called oligonucleotides, or just oligos (oligo is Greek for few). For example, the polymerase chain reaction (PCR) is a routine laboratory procedure for amplifying a specific nucleotide sequence from DNA or indirectly from RNA. In PCR, oligos are used as templates for DNA replication, and the subsequence between the oligos is amplified. The most important feature of oligos is they may be short enough to give rise to many false-positive matches. In a test tube, we would say the oligo hybridizes nonspecifically, and in a BLAST experiment, we would say the alignments have high expectations.
Approach
Our goal here is to simulate the interaction between an oligo and a genome in a test tube. The thermodynamics of annealing plex and depending on the conditions of the experiment (temperature, salt concentration, length, position of oligo), some mismatches between the sequences and even ga