Monday, 9 May 2011

been busy, realy busy doing bioinformatics

RNA Structure and Prediction
Computational Molecular Biology (BIO502)
M. Nelson and S. Istrail


RNA folding

  • RNA is transcribed (or synthesized) in cells as single strands of (ribose) nucleic acids. However, these sequences are not simply long strands of nucleotides. Rather, intra-strand base pairing will produce structures such as the one shown below. In RNA, guanine and cytosine pair (GC) by forming a triple hydrogen bond, and adenine and uracil pair (AU) by a double hydrogen bond; additionally, guanine and uracil can form a single hydrogen bond base pair.
    The stability of a particular secondary structure is a function of several constraints:
    1. The number of GC versus AU and GU base pairs.
      (Higher energy bonds form more stable structures.)
    2. The number of base pairs in a stem region.
      (Longer stems result in more bonds.)
    3. The number of base pairs in a hairpin loop region.
      (Formation of loops with more than 10 or less than 5 bases requires more energy.)
    4. The number of unpaired bases, whether interior loops or bulges.
      (Unpaired bases decrease the stability of the structure.)
    The stability of a secondary structure is quantified as the amount of free energy released or used by forming base pairs. Positive free energy requires work to form a configuration; negative free energies release stored work. Free energies are additive, so one can determine the total free energy of a secondary structure by adding all the component free energies (units are kilocalories per mole). The more negative the free energy of a structure, the more likely is formation of that structure, because more stored energy is released. This fact is used to predict the secondary structure of a particular sequence. Discovering a base pair configuration with the minimum possible free energy is the goal of most secondary structure prediction algorithms.
  • To compute the minimum free energy of a sequence, empirical energy parameters are used. These parameters summarize free energy change (positive or negative) associated with all possible pairing configurations, including base pair stacks and internal base pairs, internal, bulge and hairpin loops, and various motifs which are know to occur with great frequency. Zuker has online tables of free energy and enthalpy values for various motifs.
  • Four major classes of RNA exist, and can be found in most organisms:
    1. mRNA - messenger RNA, is a sequence which codes for formation of one or more proteins.
    2. tRNA - transfer RNA, small (~80 bases) sequences which bring amino acids to the ribosome, where they translate mRNA into amino acid sequences.
    3. rRNA - ribosomal RNA sequences form ribosomes (along with ribosomal proteins).
      (You can read more about the first three clases by clicking here.)
    4. viral RNA (You see some viral RNA structures here and here.)
  • It is important to note that most RNA folding algorithms predict only secondary, rather than tertiary structure. The three-dimensional shape of the molecule is important to molecular function, but is harder to predict. This is because tertiary structure is know from crystallography for only tRNA sequences (as illustrated at the top of this page). Secondary structure is usually considered a sufficient approximation, until more is know about tertiary structure of RNA.

Predicting RNA secondary structure


Related Sites


References

Abrahams, J.P., M. van den Berg, E. van Batenburg, and C. Pleij. 1990. Prediction of RNA secondary structure, including pseudo-knotting by computer simulation. Nucleic Acids Research 18:3035-3044. Gesteland, R.F., and J.F. Atkins, eds. 1993. The RNA World. Cold Spring Harbor Laboratory Press. TOC can be found here.
Gruner, W., R. Giegerich, D. Strothmann, C. Reidys, J. Weber, I. Hofacker, P. Stadler, and P. Schuster. Analysis of RNA sequence structure maps by exhaustive enumeration. Santa Fe Institute Preprint 95-10-099. Click here for the abstract or here for a postscript version of the paper.
Jaeger, J.A., D.H. Turner and M. Zuker. 1990. Predicting optimal and suboptimal secondary structure for RNA, in "Molecular Evolution: Computer Analysis of Protein and Nucleic Acid Sequences", R.F. Doolittle ed., Methods in Enzymology 183, 281-306.
McCaskill, J.S. 1990. The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers, 29, 1105-19.
Stolorz, P., M. Huynen, I. Hofacker, and P. Stadler. RNA folding on massively parallel computers. Santa Fe Institute preprint 95-10-089. Click here for the abstract or here for a postscript version of the paper.
Turner, D.H., N. Sugimoto, and S.M. Freier. 1988. RNA structure prediction. Ann. Rev. Biophys. and Biophys. Chem. 17: 167-192.
Waterman, M.S., and T.H. Byers. 1985. A dynamic programming algorithm to find all solutions in a neighborhood of the optimum. Mathematical Biosciences, 77, 179-188.
Williams, A.L., & Tinoco, I.Jr. 1986. A dynamic programming algorithm for finding alternate RNA secondary structures. Nucleic Acids Research, 14, 299-315.
Zuker, M.. 1989. On finding all suboptimal foldings of an RNA molecule. Science 244:48-52.


Lecture notes compiled by P. Hraber, May 1996
Please send comments, additions, and corrections to him.

No comments:

Post a Comment