I know not with what weapons World War III will be fought, but World War IV will be fought with sticks and stones.
Albert Einstein
Very gun that is made, every warship launched, every rocket fired signifies in the final sense, a theft from those who hunger and are not fed, those who are cold and are not clothed. This world in arms is not spending money alone. It is spending the sweat of its laborers, the genius of its scientists, the hopes of its children. This is not a way of life at all in any true sense. Under the clouds of war, it is humanity hanging on a cross of iron.
Dwight Eisenhower 1953 speech
ready to learn and understand more and more about bioinformatics what it means and its application in science.
Friday, 16 December 2011
Thursday, 4 August 2011
Artificial intelligence course for those intrested
Stanford CS221: Introduction to Artificial IntelligenceProfessors Sebastian Thrun and Peter Norvig | ||
| OverviewCS221 is the introductory course into the field of Artificial Intelligence at Stanford University. It covers basic elements of AI, such as knowledge representation, inference, machine learning, planning and game playing, information retrieval, and computer vision and robotics. CS221 is a broad course aimed to teach students the very basics of modern AI. It is prerequisite to many other, more specialized AI classes at Stanford University. Instructors
Who Should Attend?With an in-class enrollment of nearly 200 students, CS221 is one of the largest courses taught at Stanford University, across all departments and all disciplines. It is included in the core curriculum of several degree programs at Stanford. The course is tailored towards advanced undergraduate or early graduate students, new to Artificial Intelligence, who wish to learn about the excitement in the field. The course indtroduce a wealth of topics in AI, many of which are then subject of more specialized follow-on classes at Stanford. This version of CS221 will also be offered online. Using some new technology, the instuctors will offer materials used in this class to online students, free of charge. It is their objective to offer identical homework assignments, quizzes, and exams in both versions of this course. Students taking the online version will therefore be graded according to the same grading criteria as students taking CS221 at Stanford. However, to receive Stanford credit, the course has to be taken through Stanford; and students have to be registered at Stanford University. Online student will only get a certificate in the name of the instructors, but no official Stanford certificate. Course DescriptionThis course is 10 weeks long. The in-class version starts Tue, Sept 27. The online version begins Mon, Oct 2, 2011. The course consists of Passing RequirementsTo pass this course, you have to attend (or watch online) all lectures. You have to turn in all homework assignments and exams. We grant a total of six "late days" which can be used to turn an assignment or an exam in late. Stanford has a strong Honor's Code. We expect you to honor this code. Violations may lead to disciplinary action against you. PrerequisitesA solid understanding of probability and linear algebra will be required. |
Friday, 8 July 2011
this is so amazing discovery!
HIV Mutates to Death With New DrugEric Bland, Discovery News |
HIV | Discovery News Video
Feb. 9, 2009 -- HIV is notorious for its ability to mutate and evade drugs designed to destroy it. Now scientists are testing a new drug that actually speeds up that rate of change in the hope that the deadly virus will mutate itself to death.
"The HIV virus is so dependent on mutation that it really lives on the edge of existence," said John Reno, Chief Operating Officer for Koronis Pharmaceuticals, the company developing a drug called KP-1461. "But we figured that if we could increase this mutation rate, [HIV] might finally fall off that edge."
KP-1461 is a mutagen, meaning it encourages mutation, and has been in development for several years by the scientists at Koronis Pharmaceuticals.
When any cell or virus reproduces, there are inevitable mistakes, or mutations, as the four building blocks of DNA pair together into a double helix. Usually, the base adenine pairs up with the base thymine, and one called guanine pairs with cytosine.
Related Content:
KP-1461 looks like both thymine and cytosine, and will occasionally replace one of the normal bases in DNA, causing more errors.
"It really mucks up the genetic information inside the viral DNA," said Reno.
Disrupting HIV's replication doesn't directly destroy the virus, however, at least not immediately. It's the build-up of genetic mistakes that finally destroys it.
"The HIV virus is so dependent on mutation that it really lives on the edge of existence," said John Reno, Chief Operating Officer for Koronis Pharmaceuticals, the company developing a drug called KP-1461. "But we figured that if we could increase this mutation rate, [HIV] might finally fall off that edge."
KP-1461 is a mutagen, meaning it encourages mutation, and has been in development for several years by the scientists at Koronis Pharmaceuticals.
When any cell or virus reproduces, there are inevitable mistakes, or mutations, as the four building blocks of DNA pair together into a double helix. Usually, the base adenine pairs up with the base thymine, and one called guanine pairs with cytosine.
Related Content:
- Discovery Tech: Get the Wide Angle on Synthetic Biology
- Gold Nanoparticles Resurrect Failed HIV Drug
- HowStuffWorks.com: The Speed of Mutation
- More Discovery News
KP-1461 looks like both thymine and cytosine, and will occasionally replace one of the normal bases in DNA, causing more errors.
"It really mucks up the genetic information inside the viral DNA," said Reno.
Disrupting HIV's replication doesn't directly destroy the virus, however, at least not immediately. It's the build-up of genetic mistakes that finally destroys it.
Wednesday, 29 June 2011
transposons
Transposons: Mobile DNA
- cause mutations
- increase (or decrease) the amount of DNA in the genome of the cell, and if the cell is the precursor of a gamete, in the genomes of any descendants.
There are two distinct types:
- Class II transposons. These consist of DNA that moves directly from place to place.
- Class I transposons. These are retrotransposons that
- first transcribe the DNA into RNA and then
- use reverse transcriptase to make a DNA copy of the RNA to insert in a new location.
Class II Transposons
Class II transposons move by a "cut and paste" process: the transposon is cut out of its location (like command/control-X on your computer) and inserted into a new location (command/control-V).This process requires an enzyme — a transposase — that is encoded within some of these transposons.

- both ends of the transposon, which consist of inverted repeats; that is, identical sequences reading in opposite directions.
- a sequence of DNA that makes up the target site. Some transposases require a specific sequence as their target site; others can insert the transposon anywhere in the genome.
After the transposon is ligated to the host DNA, the gaps are filled in by Watson-Crick base pairing. This creates identical direct repeats at each end of the transposon.
Often transposons lose their gene for transposase. But as long as somewhere in the cell there is a transposon that can synthesize the enzyme, their inverted repeats are recognized and they, too, can be moved to a new location.
Miniature Inverted-repeat Transposable Elements (MITEs)
The recent completion of the genome sequence of rice and C. elegans has revealed that their genomes contain thousands of copies of a recurring motif consisting of- almost identical sequences of about 400 base pairs flanked by
- characteristic inverted repeats of about 15 base pairs such as
5' GGCCAGTCACAATGG..~400 nt..CCATTGTGACTGGCC 3'
3' CCGGTCAGTGTTACC..~400 nt..GGTAACACTGACCGG 5'
- do encode the necessary enzyme and
- recognize the same inverted repeats
MITEs have also been found in the genomes of humans, Xenopus, and apples.
Transposons in maize
The first transposons were discovered in the 1940s by Barbara McClintock who worked with maize (Zea mays, called "corn" in the U.S.). She found that they were responsible for a variety of types of gene mutations, usually
- insertions and deletions (indels)
- translocations
Some of the mutations (c, bz) used as examples of how gene loci are mapped on the chromosome were caused by transposons. [Link] |
It took about 40 years for other scientists to fully appreciate the significance of Barbara McClintock's discoveries. She was finally awarded a Nobel Prize in 1983.
Transposons in Drosophila
P elements are Class II transposons found in Drosophila. They do little harm because expression of their transposase gene is usually repressed. However, when male flies with P elements mate with female flies lacking them, the transposase becomes active in the germline producing so many mutations that their offspring are sterile.In nature this is no longer a problem. P elements seem to have first appeared in Drosophila melanogaster about 50 years ago. Since then, they have spread through every population of the species. Today flies lacking P elements can only be found in old strains maintained in the laboratory.
P elements have provided valuable tools for Drosophila geneticists. Transgenic flies containing any desired gene can be produced by injecting the early embryo with an engineered P element containing that gene.
Other transposons are being studied for their ability to create transgenic insects of agricultural and public health importance.
Transposons in bacteria
Some transposons in bacteria carry — in addition to the gene for transposase — genes for one or more (usually more) proteins imparting resistance to antibiotics. When such a transposon is incorporated in a plasmid, it can leave the host cell and move to another. This is the way that the alarming phenomenon of multidrug antibiotic resistance spreads so rapidly. Transposition in these cases occurs by a "copy and paste" (command/control-C -> command/control-V) mechanism. This requires an additional enzyme — a resolvase — that is also encoded in the transposon itself. The original transposon remains at the original site while its copy is inserted at a new site.Retrotransposons
Retrotransposons also move by a "copy and paste" mechanism but in contrast to the transposons described above, the copy is made of RNA, not DNA.The RNA copies are then transcribed back into DNA — using a reverse transcriptase — and these are inserted into new locations in the genome.
Many retrotransposons have long terminal repeats (LTRs) at their ends that may contain over 1000 base pairs in each.
Like DNA transposons, retrotransposons generate direct repeats at their new sites of insertion. In fact, it is the presence of these direct repeats that often is the clue that the intervening stretch of DNA arrived there by retrotransposition. 42% of the entire human genome consists of retrotransposons.
HIV-1
HIV-1 — the cause of AIDS — and other human retroviruses (e.g., HTLV-1, the human T-cell leukemia virus) behave like retrotransposons. The RNA genome of HIV-1 contains a gene for- reverse transcriptase and one for
- integrase. The integrase serves the same function as the transposases of DNA transposons. The DNA copies can be inserted anywhere in the genome.
Link to an illustration and further discussion. |
LINEs (Long interspersed elements)
- The human genome contains some 868,000 LINEs (representing ~17% of the genome).
- Most of these belong to a family called LINE-1 (L1).
- These L1 elements are DNA sequences that range in length from a few hundred to as many as 9,000 base pairs.
- Only about 50 L1 elements are functional "genes"; that is, can be transcribed and translated.
- The functional L1 elements are about 6,500 bp in length and encode three proteins, including
- an endonuclease that cuts DNA and a
- reverse transcriptase that makes a DNA copy of an RNA transcript.
- L1 activity proceeds as follows:
- RNA polymerase II transcribes the L1 DNA into RNA.
- The RNA is translated by ribosomes in the cytoplasm into the proteins.
- The proteins and RNA join together and reenter the nucleus.
- The endonuclease cuts a strand of "target" DNA, often in the intron of a gene.
- The reverse transcriptase copies the L1 RNA into L1 DNA which is inserted into the target DNA forming a new L1 element there.
The diversity of LINEs between individual human genomes make them useful markers for DNA "fingerprinting".
Variation occurs in the length of L1 elements:
- Transcription of an active L1 element sometimes continues downstream into additional DNA producing a longer transposed element.
- Reverse transcription of L1 RNA often concludes prematurely and produces a shortened transposed element.
SINEs (Short interspersed elements)
SINEs are short DNA sequences (100–400 base pairs) that represent reverse-transcribed RNA molecules originally transcribed by RNA polymerase III; that is, molecules of tRNA, 5S rRNA, and some other small nuclear RNAs. The most abundant SINEs are the Alu elements. There are over one million copies in the human genome (representing 10.6% of our total DNA).Alu elements consist of a sequence of 300 base pairs containing a site that is recognized by the restriction enzyme AluI. They appear to be reverse transcripts of 7S RNA, part of the signal recognition particle.
Most SINEs do not encode any functional molecules and depend on the machinery of active L1 elements to be transposed; that is, copied and pasted in new locations.
Transposons and Mutations
Transposons are mutagens. They can cause mutations in several ways:- If a transposon inserts itself into a functional gene, it will probably damage it. Insertion into exons, introns, and even into DNA flanking the genes (which may contain promoters and enhancers) can destroy or alter the gene's activity.
The insertion of a retrotransposon in the DNA flanking a gene for pigment synthesis is thought to have produced white grapes from a black-skinned ancestor. Later, the loss of that retrotransposon produced the red-skinned grape varieties cultivated today. - Faulty repair of the gap left at the old site (in cut and paste transposition) can lead to mutation there.
- The presence of a string of identical repeated sequences presents a problem for precise pairing during meiosis. How is the third, say, of a string of five Alu sequences on the "invading strand" of one chromatid going to ensure that it pairs with the third sequence in the other strand? If it accidentally pairs with one of the other Alu sequences, the result will be an unequal crossover — one of the commonest causes of duplications.
Link to an example of a mutation caused by unequal crossing over.
- Hemophilia A (Factor VIII gene) and Hemophilia B [Factor IX gene]
- X-linked severe combined immunodeficiency (SCID) [gene for part of the IL-2 receptor]
- porphyria
- predisposition to colon polyps and cancer [APC gene]
- Duchenne muscular dystrophy [dystrophin gene]
What good are transposons?
We don't know.They have been called "junk" DNA and "selfish" DNA.
- "selfish" because their only function seems to make more copies of themselves and
- "junk" because there is no obvious benefit to their host.
Retrotransposons cannot be so selfish that they reduce the survival of their host. Perhaps, they even confer some benefit.
Some possibilities:
- Retrotransposons often carry some additional sequences at their 3' end as they insert into a new location. Perhaps these occasionally create new combinations of exons, promoters, and enhancers that benefit the host. Example:
- Thousands of our Alu elements occur in the introns of genes.
- Some of these contain sequences that when transcribed into the primary transcript are recognized by the spliceosome.
- These can then be spliced into the mature mRNA creating a
- new exon, which will be transcribed into a new protein product.
- Alternative splicing can provide not only the new mRNA (and thus protein) but also the old.
- In this way, nature can try out new proteins without the risk of abandoning the tried-and-true old one.
- L1 elements inserted into the introns of functional genes reduce the transcription of those genes without harming the gene product — the longer the L1 element, the lower the level of gene expression. Some 79% of our genes contain L1 elements, and perhaps they are a mechanism for establishing the baseline level of gene activity.
- Telomerase, the enzyme essential for maintaining chromosome length, is closely related to the reverse transcriptase of LINEs and may have evolved from it.
- RAG-1 and RAG-2. The proteins encoded by these genes are needed to assemble the repertoire of antibodies and T-cells receptors (TCRs) used by the adaptive immune system [Link]. The mechanism [Link] resembles that of the cut and paste method of Class II transposons , and the RAG genes may have evolved from them. If so, the event occurred some 450 million years ago when the jawed vertebrates evolved from jawless ancestors [Link]. Only jawed vertebrates have the RAG-1 and RAG-2 genes.
- In Drosophila, the insertion of transposons into genes has been linked to the development of resistance to DDT and organophosphate insecticides.
Transposons and the C-value Paradox
- The genome of Arabidopsis thaliana contains ~1.2 x 108 base pairs (bp) of DNA. About 14% of this consists of transposons; the rest functional genes (about 28,000 of them).
- The maize (corn) genome contains 20 times more DNA (2.4 x 109 bp) but surely has no need for 20 times as many genes. In fact, 60% of the corn genome is made up of transposons. (The figure for humans is 42%.)
- Most of the 2.5 x 1011 bp of DNA in the genome of Psilotum nudum is presumably "junk" DNA.
Welcome&Next Search |
13 June 2011
Monday, 30 May 2011
today im doing jokes??????
The average woman would rather have beauty than brains, because the average man can see better than he can think......wat do you think?
Friday, 20 May 2011
basic rna structure
Most cellular RNA molecules are single stranded. They may form secondary structures such as stem-loop and hairpin.
Figure 3-C-1. Secondary structure of RNA. (a) stem-loop. (b) hairpin.
The major role of RNA is to participate in protein synthesis, which requires three classes of RNA:

The major role of RNA is to participate in protein synthesis, which requires three classes of RNA:
messenger RNA (mRNA)Other classes of RNA include
transfer RNA (tRNA)
ribosomal RNA (rRNA)
Ribozymes
The RNA molecules with catalytic activity.
Small RNA molecules
RNA interference and other functions.
Monday, 9 May 2011
been busy, realy busy doing bioinformatics
RNA Structure and Prediction
Computational Molecular Biology (BIO502)
M. Nelson and S. Istrail
RNA folding
- RNA is transcribed (or synthesized) in cells as single strands of (ribose) nucleic acids. However, these sequences are not simply long strands of nucleotides. Rather, intra-strand base pairing will produce structures such as the one shown below.
In RNA, guanine and cytosine pair (GC) by forming a triple hydrogen bond, and adenine and uracil pair (AU) by a double hydrogen bond; additionally, guanine and uracil can form a single hydrogen bond base pair.
The stability of a particular secondary structure is a function of several constraints:
- The number of GC versus AU and GU base pairs.
(Higher energy bonds form more stable structures.) - The number of base pairs in a stem region.
(Longer stems result in more bonds.) - The number of base pairs in a hairpin loop region.
(Formation of loops with more than 10 or less than 5 bases requires more energy.) - The number of unpaired bases, whether interior loops or bulges.
(Unpaired bases decrease the stability of the structure.)
- The number of GC versus AU and GU base pairs.
- To compute the minimum free energy of a sequence, empirical energy parameters are used. These parameters summarize free energy change (positive or negative) associated with all possible pairing configurations, including base pair stacks and internal base pairs, internal, bulge and hairpin loops, and various motifs which are know to occur with great frequency. Zuker has online tables of free energy and enthalpy values for various motifs.
- Four major classes of RNA exist, and can be found in most organisms:
- mRNA - messenger RNA, is a sequence which codes for formation of one or more proteins.
- tRNA - transfer RNA, small (~80 bases) sequences which bring amino acids to the ribosome, where they translate mRNA into amino acid sequences.
- rRNA - ribosomal RNA sequences form ribosomes (along with ribosomal proteins).
(You can read more about the first three clases by clicking here.) - viral RNA (You see some viral RNA structures here and here.)
- It is important to note that most RNA folding algorithms predict only secondary, rather than tertiary structure. The three-dimensional shape of the molecule is important to molecular function, but is harder to predict. This is because tertiary structure is know from crystallography for only tRNA sequences (as illustrated at the top of this page). Secondary structure is usually considered a sufficient approximation, until more is know about tertiary structure of RNA.
Predicting RNA secondary structure
- Several representations of secondary structure have been utilized, each with different advantages. The planar graph representation shown above gives an intuition for the shape of an RNA sequence, but the same structure could also be represented in string notation. In string notation, balanced parenthesis are used to indicate paired bases, and periods are used to indicate unpaired bases. The secondary structure in the above figure is given as ((((((((((((((....)))))))))))))) in string notation. For a discussion of the advantages of string notation, and examples of other represenation schemes, see Hofacker et al. (1995) and Gruner et al. (1995).
- The number of possible secondary structures (S) of n bases with k base pairs is given as
- A number of strategies for predicting secondary structure have been developed. Gruner et al. provide a taxonomy of folding algorithms, and references for each algorithm. Their table is summarized here: * algorithm can predict pseudo-knots
- The Waterman algorithm
- Now that we can find the minimum free energy structure of a sequence in computationally tractable time, we should ask ``What does the optimum tell us''? That is, there may be more than one structure with the optimum free energy, or there may be many structures within 5% to 10% of the minimum free energy, and these may be topologically very different. A minimum energy folding algorithm will return only one secondary structure, though there are many candidates for the natural structure. To address this, some software packages (such as Zuker's mfold) will display a number of suboptimal folds. Inferring what structure is truly representative of the natural structure requires additional information. Phylogenetic information is often used to constrain the search by identifying highly conserved motifs. Some programs allow the user to specify constraints on the secondary structure, by specifying paired, single-stranded, or non-pairable regions, or by actively participating in the folding process.
- Of course, there are a number of limiting assumptions to existing folding algorithms. These include the kinetics of folding during transcription, the difficulty of predicting pseudo-knots, the role of chaperone proteins in folding, and the importance of modified bases (e.g. inosine or methylated bases). Some algorithms attempt to incorporate these considerations (e.g. see Abrahams et al. for predicting pseudo-knots). At best, RNA folding algorithms are first-order approximations used to infer the natural structure of a known sequence.
Related Sites
- RNA world at IMB Jena. This page contains links to databases and software, information about meetings, and a number of search utilities.
- A list of RNA related sites, compiled by Cambridge University Press.
- Image library of biological macromolecules, at IMB Jena has illustrations of molecular structure.
- Molecules R US, maintained by NIH, has a fancy interface to structural information in the Brookhaven Protein Database. You can use the interface to view molecules by a number of methods.
- Michael Zuker's rna page
- M. Zuker's interactive mfold server will fold sequences online.
- The Vienna RNA ftp site is located here.
- A list of folding software links, for a variety of platforms.
- Abstracts at the Institute for Theoretical Chemistry, in Vienna
- Abstracts at the Santa Fe Institute
References
Abrahams, J.P., M. van den Berg, E. van Batenburg, and C. Pleij. 1990. Prediction of RNA secondary structure, including pseudo-knotting by computer simulation. Nucleic Acids Research 18:3035-3044. Gesteland, R.F., and J.F. Atkins, eds. 1993. The RNA World. Cold Spring Harbor Laboratory Press. TOC can be found here.Gruner, W., R. Giegerich, D. Strothmann, C. Reidys, J. Weber, I. Hofacker, P. Stadler, and P. Schuster. Analysis of RNA sequence structure maps by exhaustive enumeration. Santa Fe Institute Preprint 95-10-099. Click here for the abstract or here for a postscript version of the paper.
Jaeger, J.A., D.H. Turner and M. Zuker. 1990. Predicting optimal and suboptimal secondary structure for RNA, in "Molecular Evolution: Computer Analysis of Protein and Nucleic Acid Sequences", R.F. Doolittle ed., Methods in Enzymology 183, 281-306.
McCaskill, J.S. 1990. The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers, 29, 1105-19.
Stolorz, P., M. Huynen, I. Hofacker, and P. Stadler. RNA folding on massively parallel computers. Santa Fe Institute preprint 95-10-089. Click here for the abstract or here for a postscript version of the paper.
Turner, D.H., N. Sugimoto, and S.M. Freier. 1988. RNA structure prediction. Ann. Rev. Biophys. and Biophys. Chem. 17: 167-192.
Waterman, M.S., and T.H. Byers. 1985. A dynamic programming algorithm to find all solutions in a neighborhood of the optimum. Mathematical Biosciences, 77, 179-188.
Williams, A.L., & Tinoco, I.Jr. 1986. A dynamic programming algorithm for finding alternate RNA secondary structures. Nucleic Acids Research, 14, 299-315.
Zuker, M.. 1989. On finding all suboptimal foldings of an RNA molecule. Science 244:48-52.
Lecture notes compiled by P. Hraber, May 1996
Please send comments, additions, and corrections to him.
been busy, realy busy doing bioinformatics
RNA Structure and Prediction
Computational Molecular Biology (BIO502)
M. Nelson and S. Istrail
RNA folding
- RNA is transcribed (or synthesized) in cells as single strands of (ribose) nucleic acids. However, these sequences are not simply long strands of nucleotides. Rather, intra-strand base pairing will produce structures such as the one shown below.
In RNA, guanine and cytosine pair (GC) by forming a triple hydrogen bond, and adenine and uracil pair (AU) by a double hydrogen bond; additionally, guanine and uracil can form a single hydrogen bond base pair.
The stability of a particular secondary structure is a function of several constraints:
- The number of GC versus AU and GU base pairs.
(Higher energy bonds form more stable structures.) - The number of base pairs in a stem region.
(Longer stems result in more bonds.) - The number of base pairs in a hairpin loop region.
(Formation of loops with more than 10 or less than 5 bases requires more energy.) - The number of unpaired bases, whether interior loops or bulges.
(Unpaired bases decrease the stability of the structure.)
- The number of GC versus AU and GU base pairs.
- To compute the minimum free energy of a sequence, empirical energy parameters are used. These parameters summarize free energy change (positive or negative) associated with all possible pairing configurations, including base pair stacks and internal base pairs, internal, bulge and hairpin loops, and various motifs which are know to occur with great frequency. Zuker has online tables of free energy and enthalpy values for various motifs.
- Four major classes of RNA exist, and can be found in most organisms:
- mRNA - messenger RNA, is a sequence which codes for formation of one or more proteins.
- tRNA - transfer RNA, small (~80 bases) sequences which bring amino acids to the ribosome, where they translate mRNA into amino acid sequences.
- rRNA - ribosomal RNA sequences form ribosomes (along with ribosomal proteins).
(You can read more about the first three clases by clicking here.) - viral RNA (You see some viral RNA structures here and here.)
- It is important to note that most RNA folding algorithms predict only secondary, rather than tertiary structure. The three-dimensional shape of the molecule is important to molecular function, but is harder to predict. This is because tertiary structure is know from crystallography for only tRNA sequences (as illustrated at the top of this page). Secondary structure is usually considered a sufficient approximation, until more is know about tertiary structure of RNA.
Predicting RNA secondary structure
- Several representations of secondary structure have been utilized, each with different advantages. The planar graph representation shown above gives an intuition for the shape of an RNA sequence, but the same structure could also be represented in string notation. In string notation, balanced parenthesis are used to indicate paired bases, and periods are used to indicate unpaired bases. The secondary structure in the above figure is given as ((((((((((((((....)))))))))))))) in string notation. For a discussion of the advantages of string notation, and examples of other represenation schemes, see Hofacker et al. (1995) and Gruner et al. (1995).
- The number of possible secondary structures (S) of n bases with k base pairs is given as
- A number of strategies for predicting secondary structure have been developed. Gruner et al. provide a taxonomy of folding algorithms, and references for each algorithm. Their table is summarized here: * algorithm can predict pseudo-knots
- The Waterman algorithm
- Now that we can find the minimum free energy structure of a sequence in computationally tractable time, we should ask ``What does the optimum tell us''? That is, there may be more than one structure with the optimum free energy, or there may be many structures within 5% to 10% of the minimum free energy, and these may be topologically very different. A minimum energy folding algorithm will return only one secondary structure, though there are many candidates for the natural structure. To address this, some software packages (such as Zuker's mfold) will display a number of suboptimal folds. Inferring what structure is truly representative of the natural structure requires additional information. Phylogenetic information is often used to constrain the search by identifying highly conserved motifs. Some programs allow the user to specify constraints on the secondary structure, by specifying paired, single-stranded, or non-pairable regions, or by actively participating in the folding process.
- Of course, there are a number of limiting assumptions to existing folding algorithms. These include the kinetics of folding during transcription, the difficulty of predicting pseudo-knots, the role of chaperone proteins in folding, and the importance of modified bases (e.g. inosine or methylated bases). Some algorithms attempt to incorporate these considerations (e.g. see Abrahams et al. for predicting pseudo-knots). At best, RNA folding algorithms are first-order approximations used to infer the natural structure of a known sequence.
Related Sites
- RNA world at IMB Jena. This page contains links to databases and software, information about meetings, and a number of search utilities.
- A list of RNA related sites, compiled by Cambridge University Press.
- Image library of biological macromolecules, at IMB Jena has illustrations of molecular structure.
- Molecules R US, maintained by NIH, has a fancy interface to structural information in the Brookhaven Protein Database. You can use the interface to view molecules by a number of methods.
- Michael Zuker's rna page
- M. Zuker's interactive mfold server will fold sequences online.
- The Vienna RNA ftp site is located here.
- A list of folding software links, for a variety of platforms.
- Abstracts at the Institute for Theoretical Chemistry, in Vienna
- Abstracts at the Santa Fe Institute
References
Abrahams, J.P., M. van den Berg, E. van Batenburg, and C. Pleij. 1990. Prediction of RNA secondary structure, including pseudo-knotting by computer simulation. Nucleic Acids Research 18:3035-3044. Gesteland, R.F., and J.F. Atkins, eds. 1993. The RNA World. Cold Spring Harbor Laboratory Press. TOC can be found here.Gruner, W., R. Giegerich, D. Strothmann, C. Reidys, J. Weber, I. Hofacker, P. Stadler, and P. Schuster. Analysis of RNA sequence structure maps by exhaustive enumeration. Santa Fe Institute Preprint 95-10-099. Click here for the abstract or here for a postscript version of the paper.
Jaeger, J.A., D.H. Turner and M. Zuker. 1990. Predicting optimal and suboptimal secondary structure for RNA, in "Molecular Evolution: Computer Analysis of Protein and Nucleic Acid Sequences", R.F. Doolittle ed., Methods in Enzymology 183, 281-306.
McCaskill, J.S. 1990. The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers, 29, 1105-19.
Stolorz, P., M. Huynen, I. Hofacker, and P. Stadler. RNA folding on massively parallel computers. Santa Fe Institute preprint 95-10-089. Click here for the abstract or here for a postscript version of the paper.
Turner, D.H., N. Sugimoto, and S.M. Freier. 1988. RNA structure prediction. Ann. Rev. Biophys. and Biophys. Chem. 17: 167-192.
Waterman, M.S., and T.H. Byers. 1985. A dynamic programming algorithm to find all solutions in a neighborhood of the optimum. Mathematical Biosciences, 77, 179-188.
Williams, A.L., & Tinoco, I.Jr. 1986. A dynamic programming algorithm for finding alternate RNA secondary structures. Nucleic Acids Research, 14, 299-315.
Zuker, M.. 1989. On finding all suboptimal foldings of an RNA molecule. Science 244:48-52.
Lecture notes compiled by P. Hraber, May 1996
Please send comments, additions, and corrections to him.
Tuesday, 19 April 2011
understanding bioinformatics
Over the past few decades, major advances in the field of molecular biology, coupled with advances in genomic technologies, have led to an explosive growth in the biological information generated by the scientific community. This deluge of genomic information has, in turn, led to an absolute requirement for computerized databases to store, organize, and index the data and for specialized tools to view and analyze the data.
Wednesday, 6 April 2011
misconceptions about biinformatics
what do you think when a proffesion tells you he/she is a bioinformatician,whats the first thought that crosses your mind?