Friday, 16 December 2011

war war war

I know not with what weapons World War III will be fought, but World War IV will be fought with sticks and stones.
Albert Einstein

Very gun that is made, every warship launched, every rocket fired signifies in the final sense, a theft from those who hunger and are not fed, those who are cold and are not clothed. This world in arms is not spending money alone. It is spending the sweat of its laborers, the genius of its scientists, the hopes of its children. This is not a way of life at all in any true sense. Under the clouds of war, it is humanity hanging on a cross of iron.
Dwight Eisenhower 1953 speech

Thursday, 4 August 2011

Artificial intelligence course for those intrested

Stanford CS221: Introduction to Artificial Intelligence

Professors Sebastian Thrun and Peter Norvig

Outline

Schedule

Online

Overview

CS221 is the introductory course into the field of Artificial Intelligence at Stanford University. It covers basic elements of AI, such as knowledge representation, inference, machine learning, planning and game playing, information retrieval, and computer vision and robotics. CS221 is a broad course aimed to teach students the very basics of modern AI. It is prerequisite to many other, more specialized AI classes at Stanford University.

Instructors

Professors Peter Novig and Sebastian Thrun took over CS221 from Professor Andrew Y. Ng in 2010. Peter Norvig is author of the celebrated textbook Artificial Intelligence: A Modern Approach. He is also Director of Research at Google. Thrun is well known for his work on robotics and self-driving cars (His team won the DARPA Grand Challenge). Thrun is research professor at Stanford and a Google Fellow. He is one of the youngest individuals ever elected into the National Academy of Engineering (at age 39).

Who Should Attend?

With an in-class enrollment of nearly 200 students, CS221 is one of the largest courses taught at Stanford University, across all departments and all disciplines. It is included in the core curriculum of several degree programs at Stanford. The course is tailored towards advanced undergraduate or early graduate students, new to Artificial Intelligence, who wish to learn about the excitement in the field. The course indtroduce a wealth of topics in AI, many of which are then subject of more specialized follow-on classes at Stanford. This version of CS221 will also be offered online. Using some new technology, the instuctors will offer materials used in this class to online students, free of charge. It is their objective to offer identical homework assignments, quizzes, and exams in both versions of this course. Students taking the online version will therefore be graded according to the same grading criteria as students taking CS221 at Stanford. However, to receive Stanford credit, the course has to be taken through Stanford; and students have to be registered at Stanford University. Online student will only get a certificate in the name of the instructors, but no official Stanford certificate.

Course Description

This course is 10 weeks long. The in-class version starts Tue, Sept 27. The online version begins Mon, Oct 2, 2011. The course consists of
Approximately 20 lectures. Each lecture includes quizzes that we ask you to do, but which are not counted towards the final grade of this class. Instead, you can see the right answer to each quizz right after submitting your answers.

Approximately 8 homework assignments. Those are just like our quizzes, and if you do well in the quizzes, you should do well in the assignments. However, we won't show you the correct answer only with a few days delay, to discourage cheating.

One midterm and one final exam. These are like extended quizzes, covering all subject areas of the course discussed so far. The exams will also check your general knowledge about topics covered in the reading materials (the book).

The central objective is to teach basic methods in AI, and to convey enthusiasm for the field. AI has emerged as one of the most impactful disciplines in science and technology. Google, for example, is massively run on AI. Students passing this course should be proficient basic methods of AI, and have a broad overview of the field.q

Passing Requirements

To pass this course, you have to attend (or watch online) all lectures. You have to turn in all homework assignments and exams. We grant a total of six "late days" which can be used to turn an assignment or an exam in late. Stanford has a strong Honor's Code. We expect you to honor this code. Violations may lead to disciplinary action against you.

Prerequisites

A solid understanding of probability and linear algebra will be required.

Friday, 8 July 2011

this is so amazing discovery!

HIV Mutates to Death With New Drug

Eric Bland, Discovery News

HIV | Discovery News Video

Feb. 9, 2009 -- HIV is notorious for its ability to mutate and evade drugs designed to destroy it. Now scientists are testing a new drug that actually speeds up that rate of change in the hope that the deadly virus will mutate itself to death.
"The HIV virus is so dependent on mutation that it really lives on the edge of existence," said John Reno, Chief Operating Officer for Koronis Pharmaceuticals, the company developing a drug called KP-1461. "But we figured that if we could increase this mutation rate, [HIV] might finally fall off that edge."
KP-1461 is a mutagen, meaning it encourages mutation, and has been in development for several years by the scientists at Koronis Pharmaceuticals.
When any cell or virus reproduces, there are inevitable mistakes, or mutations, as the four building blocks of DNA pair together into a double helix. Usually, the base adenine pairs up with the base thymine, and one called guanine pairs with cytosine.
Related Content:

KP-1461 looks like both thymine and cytosine, and will occasionally replace one of the normal bases in DNA, causing more errors.
"It really mucks up the genetic information inside the viral DNA," said Reno.
Disrupting HIV's replication doesn't directly destroy the virus, however, at least not immediately. It's the build-up of genetic mistakes that finally destroys it.

Next »21« Previous

Wednesday, 29 June 2011

transposons

Transposons: Mobile DNA

Transposons are segments of DNA that can move around to different positions in the genome of a single cell. In the process, they may

cause mutations
increase (or decrease) the amount of DNA in the genome of the cell, and if the cell is the precursor of a gamete, in the genomes of any descendants.

These mobile segments of DNA are sometimes called "jumping genes".
There are two distinct types:

Class II transposons. These consist of DNA that moves directly from place to place.
Class I transposons. These are retrotransposons that
- first transcribe the DNA into RNA and then
- use reverse transcriptase to make a DNA copy of the RNA to insert in a new location.

Class II Transposons

Class II transposons move by a "cut and paste" process: the transposon is cut out of its location (like command/control-X on your computer) and inserted into a new location (command/control-V).
This process requires an enzyme — a transposase — that is encoded within some of these transposons.

Transposase binds to:

both ends of the transposon, which consist of inverted repeats; that is, identical sequences reading in opposite directions.
a sequence of DNA that makes up the target site. Some transposases require a specific sequence as their target site; others can insert the transposon anywhere in the genome.

The DNA at the target site is cut in an offset manner (like the "sticky ends" produced by some restriction enzymes [Examples]).
After the transposon is ligated to the host DNA, the gaps are filled in by Watson-Crick base pairing. This creates identical direct repeats at each end of the transposon.
Often transposons lose their gene for transposase. But as long as somewhere in the cell there is a transposon that can synthesize the enzyme, their inverted repeats are recognized and they, too, can be moved to a new location.

Miniature Inverted-repeat Transposable Elements (MITEs)

The recent completion of the genome sequence of rice and C. elegans has revealed that their genomes contain thousands of copies of a recurring motif consisting of

almost identical sequences of about 400 base pairs flanked by
characteristic inverted repeats of about 15 base pairs such as
5' GGCCAGTCACAATGG..~400 nt..CCATTGTGACTGGCC 3'
3' CCGGTCAGTGTTACC..~400 nt..GGTAACACTGACCGG 5'

MITEs are too small to encode any protein. Just how they are copied and moved to new locations is still uncertain. Probably larger transposons that

do encode the necessary enzyme and
recognize the same inverted repeats

are responsible. There are over 100,000 MITEs in the rice genome (representing some 6% of the total genome). Some of the mutations found in certain strains of rice are caused by the insertion of a MITE in the gene.
MITEs have also been found in the genomes of humans, Xenopus, and apples.

Transposons in maize

The first transposons were discovered in the 1940s by Barbara McClintock who worked with maize (Zea mays, called "corn" in the U.S.). She found that they were responsible for a variety of types of gene mutations, usually

insertions and deletions (indels)
translocations

Some of the mutations (c, bz) used as examples of how gene loci are mapped on the chromosome were caused by transposons. [Link]

In developing somatic tissues like corn kernels, a mutation (e.g., c) that alters color will be passed on to all the descendant cells. This produces the variegated pattern which is so prized in "Indian corn". (Photo courtesy of Whalls Farms.)
It took about 40 years for other scientists to fully appreciate the significance of Barbara McClintock's discoveries. She was finally awarded a Nobel Prize in 1983.

Transposons in Drosophila

P elements are Class II transposons found in Drosophila. They do little harm because expression of their transposase gene is usually repressed. However, when male flies with P elements mate with female flies lacking them, the transposase becomes active in the germline producing so many mutations that their offspring are sterile.
In nature this is no longer a problem. P elements seem to have first appeared in Drosophila melanogaster about 50 years ago. Since then, they have spread through every population of the species. Today flies lacking P elements can only be found in old strains maintained in the laboratory.
P elements have provided valuable tools for Drosophila geneticists. Transgenic flies containing any desired gene can be produced by injecting the early embryo with an engineered P element containing that gene.
Other transposons are being studied for their ability to create transgenic insects of agricultural and public health importance.

Transposons in bacteria

Some transposons in bacteria carry — in addition to the gene for transposase — genes for one or more (usually more) proteins imparting resistance to antibiotics. When such a transposon is incorporated in a plasmid, it can leave the host cell and move to another. This is the way that the alarming phenomenon of multidrug antibiotic resistance spreads so rapidly. Transposition in these cases occurs by a "copy and paste" (command/control-C -> command/control-V) mechanism. This requires an additional enzyme — a resolvase — that is also encoded in the transposon itself. The original transposon remains at the original site while its copy is inserted at a new site.

Retrotransposons

Retrotransposons also move by a "copy and paste" mechanism but in contrast to the transposons described above, the copy is made of RNA, not DNA.
The RNA copies are then transcribed back into DNA — using a reverse transcriptase — and these are inserted into new locations in the genome.
Many retrotransposons have long terminal repeats (LTRs) at their ends that may contain over 1000 base pairs in each.
Like DNA transposons, retrotransposons generate direct repeats at their new sites of insertion. In fact, it is the presence of these direct repeats that often is the clue that the intervening stretch of DNA arrived there by retrotransposition. 42% of the entire human genome consists of retrotransposons.

HIV-1

HIV-1 — the cause of AIDS — and other human retroviruses (e.g., HTLV-1, the human T-cell leukemia virus) behave like retrotransposons. The RNA genome of HIV-1 contains a gene for

reverse transcriptase and one for
integrase. The integrase serves the same function as the transposases of DNA transposons. The DNA copies can be inserted anywhere in the genome.

Molecules of both enzymes are incorporated in the virus particle.

Link to an illustration and further discussion.

LINEs (Long interspersed elements)

The human genome contains some 868,000 LINEs (representing ~17% of the genome).
Most of these belong to a family called LINE-1 (L1).
These L1 elements are DNA sequences that range in length from a few hundred to as many as 9,000 base pairs.
Only about 50 L1 elements are functional "genes"; that is, can be transcribed and translated.
The functional L1 elements are about 6,500 bp in length and encode three proteins, including
- an endonuclease that cuts DNA and a
- reverse transcriptase that makes a DNA copy of an RNA transcript.
L1 activity proceeds as follows:
- RNA polymerase II transcribes the L1 DNA into RNA.
- The RNA is translated by ribosomes in the cytoplasm into the proteins.
- The proteins and RNA join together and reenter the nucleus.
- The endonuclease cuts a strand of "target" DNA, often in the intron of a gene.
- The reverse transcriptase copies the L1 RNA into L1 DNA which is inserted into the target DNA forming a new L1 element there.

Through this copy-paste mechanism, the number of LINEs can increase in the genome.
The diversity of LINEs between individual human genomes make them useful markers for DNA "fingerprinting".
Variation occurs in the length of L1 elements:

Transcription of an active L1 element sometimes continues downstream into additional DNA producing a longer transposed element.
Reverse transcription of L1 RNA often concludes prematurely and produces a shortened transposed element.

While L1 elements are not functional, they may play a role in regulating the efficiency of transcription of the gene in which they reside (see below). Occasionally, L1 activity makes and inserts a copy of a cellular mRNA (thus a natural cDNA). Lacking introns as well as the necessary control elements like promoters, these genes are not expressed. They represent one category of pseudogene.

SINEs (Short interspersed elements)

SINEs are short DNA sequences (100–400 base pairs) that represent reverse-transcribed RNA molecules originally transcribed by RNA polymerase III; that is, molecules of tRNA, 5S rRNA, and some other small nuclear RNAs. The most abundant SINEs are the Alu elements. There are over one million copies in the human genome (representing 10.6% of our total DNA).
Alu elements consist of a sequence of 300 base pairs containing a site that is recognized by the restriction enzyme AluI. They appear to be reverse transcripts of 7S RNA, part of the signal recognition particle.
Most SINEs do not encode any functional molecules and depend on the machinery of active L1 elements to be transposed; that is, copied and pasted in new locations.

Transposons and Mutations

Transposons are mutagens. They can cause mutations in several ways:

If a transposon inserts itself into a functional gene, it will probably damage it. Insertion into exons, introns, and even into DNA flanking the genes (which may contain promoters and enhancers) can destroy or alter the gene's activity.

The insertion of a retrotransposon in the DNA flanking a gene for pigment synthesis is thought to have produced white grapes from a black-skinned ancestor. Later, the loss of that retrotransposon produced the red-skinned grape varieties cultivated today.

Faulty repair of the gap left at the old site (in cut and paste transposition) can lead to mutation there.
The presence of a string of identical repeated sequences presents a problem for precise pairing during meiosis. How is the third, say, of a string of five Alu sequences on the "invading strand" of one chromatid going to ensure that it pairs with the third sequence in the other strand? If it accidentally pairs with one of the other Alu sequences, the result will be an unequal crossover — one of the commonest causes of duplications.

Link to an example of a mutation caused by unequal crossing over.

SINEs (mostly Alu sequences) and LINEs cause only a small percentage of human mutations. (There may even be a mechanism by which they avoid inserting themselves into functional genes.) However, they have been found to be the cause of the mutations responsible for some cases of human genetic diseases, including:

Hemophilia A (Factor VIII gene) and Hemophilia B [Factor IX gene]
X-linked severe combined immunodeficiency (SCID) [gene for part of the IL-2 receptor]
porphyria
predisposition to colon polyps and cancer [APC gene]
Duchenne muscular dystrophy [dystrophin gene]

What good are transposons?

We don't know.
They have been called "junk" DNA and "selfish" DNA.

"selfish" because their only function seems to make more copies of themselves and
"junk" because there is no obvious benefit to their host.

Because of the sequence similarities of all the LINEs and SINEs, they also make up a large portion of the "repetitive DNA" of the cell.
Retrotransposons cannot be so selfish that they reduce the survival of their host. Perhaps, they even confer some benefit.
Some possibilities:

Retrotransposons often carry some additional sequences at their 3' end as they insert into a new location. Perhaps these occasionally create new combinations of exons, promoters, and enhancers that benefit the host. Example:
- Thousands of our Alu elements occur in the introns of genes.
- Some of these contain sequences that when transcribed into the primary transcript are recognized by the spliceosome.
- These can then be spliced into the mature mRNA creating a
- new exon, which will be transcribed into a new protein product.
- Alternative splicing can provide not only the new mRNA (and thus protein) but also the old.
- In this way, nature can try out new proteins without the risk of abandoning the tried-and-true old one.
L1 elements inserted into the introns of functional genes reduce the transcription of those genes without harming the gene product — the longer the L1 element, the lower the level of gene expression. Some 79% of our genes contain L1 elements, and perhaps they are a mechanism for establishing the baseline level of gene activity.
Telomerase, the enzyme essential for maintaining chromosome length, is closely related to the reverse transcriptase of LINEs and may have evolved from it.
RAG-1 and RAG-2. The proteins encoded by these genes are needed to assemble the repertoire of antibodies and T-cells receptors (TCRs) used by the adaptive immune system [Link]. The mechanism [Link] resembles that of the cut and paste method of Class II transposons , and the RAG genes may have evolved from them. If so, the event occurred some 450 million years ago when the jawed vertebrates evolved from jawless ancestors [Link]. Only jawed vertebrates have the RAG-1 and RAG-2 genes.
In Drosophila, the insertion of transposons into genes has been linked to the development of resistance to DDT and organophosphate insecticides.

Transposons and the C-value Paradox

The genome of Arabidopsis thaliana contains ~1.2 x 10⁸ base pairs (bp) of DNA. About 14% of this consists of transposons; the rest functional genes (about 28,000 of them).
The maize (corn) genome contains 20 times more DNA (2.4 x 10⁹ bp) but surely has no need for 20 times as many genes. In fact, 60% of the corn genome is made up of transposons. (The figure for humans is 42%.)
Most of the 2.5 x 10¹¹ bp of DNA in the genome of Psilotum nudum is presumably "junk" DNA.

So it seems likely that the lack of an association between size of genome and number of functional genes — the C-value paradox — is caused by the amount of transposon DNA accumulated in the genome.

Welcome&Next Search

13 June 2011

Monday, 30 May 2011

today im doing jokes??????

The average woman would rather have beauty than brains, because the average man can see better than he can think......wat do you think?

Friday, 20 May 2011

basic rna structure

Most cellular RNA molecules are single stranded. They may form secondary structures such as stem-loop and hairpin.

Figure 3-C-1. Secondary structure of RNA. (a) stem-loop. (b) hairpin.

The major role of RNA is to participate in protein synthesis, which requires three classes of RNA:

messenger RNA (mRNA)
transfer RNA (tRNA)
ribosomal RNA (rRNA)

Other classes of RNA include

Ribozymes
The RNA molecules with catalytic activity.
Small RNA molecules
RNA interference and other functions.

Monday, 9 May 2011

been busy, realy busy doing bioinformatics

RNA Structure and Prediction
Computational Molecular Biology (BIO502)
M. Nelson and S. Istrail

RNA folding

RNA is transcribed (or synthesized) in cells as single strands of (ribose) nucleic acids. However, these sequences are not simply long strands of nucleotides. Rather, intra-strand base pairing will produce structures such as the one shown below. In RNA, guanine and cytosine pair (GC) by forming a triple hydrogen bond, and adenine and uracil pair (AU) by a double hydrogen bond; additionally, guanine and uracil can form a single hydrogen bond base pair.
The stability of a particular secondary structure is a function of several constraints:
1. The number of GC versus AU and GU base pairs.
  (Higher energy bonds form more stable structures.)
2. The number of base pairs in a stem region.
  (Longer stems result in more bonds.)
3. The number of base pairs in a hairpin loop region.
  (Formation of loops with more than 10 or less than 5 bases requires more energy.)
4. The number of unpaired bases, whether interior loops or bulges.
  (Unpaired bases decrease the stability of the structure.)
The stability of a secondary structure is quantified as the amount of free energy released or used by forming base pairs. Positive free energy requires work to form a configuration; negative free energies release stored work. Free energies are additive, so one can determine the total free energy of a secondary structure by adding all the component free energies (units are kilocalories per mole). The more negative the free energy of a structure, the more likely is formation of that structure, because more stored energy is released. This fact is used to predict the secondary structure of a particular sequence. Discovering a base pair configuration with the minimum possible free energy is the goal of most secondary structure prediction algorithms.
To compute the minimum free energy of a sequence, empirical energy parameters are used. These parameters summarize free energy change (positive or negative) associated with all possible pairing configurations, including base pair stacks and internal base pairs, internal, bulge and hairpin loops, and various motifs which are know to occur with great frequency. Zuker has online tables of free energy and enthalpy values for various motifs.
Four major classes of RNA exist, and can be found in most organisms:
1. mRNA - messenger RNA, is a sequence which codes for formation of one or more proteins.
2. tRNA - transfer RNA, small (~80 bases) sequences which bring amino acids to the ribosome, where they translate mRNA into amino acid sequences.
3. rRNA - ribosomal RNA sequences form ribosomes (along with ribosomal proteins).
  (You can read more about the first three clases by clicking here.)
4. viral RNA (You see some viral RNA structures here and here.)
It is important to note that most RNA folding algorithms predict only secondary, rather than tertiary structure. The three-dimensional shape of the molecule is important to molecular function, but is harder to predict. This is because tertiary structure is know from crystallography for only tRNA sequences (as illustrated at the top of this page). Secondary structure is usually considered a sufficient approximation, until more is know about tertiary structure of RNA.

Predicting RNA secondary structure

Several representations of secondary structure have been utilized, each with different advantages. The planar graph representation shown above gives an intuition for the shape of an RNA sequence, but the same structure could also be represented in string notation. In string notation, balanced parenthesis are used to indicate paired bases, and periods are used to indicate unpaired bases. The secondary structure in the above figure is given as ((((((((((((((....)))))))))))))) in string notation. For a discussion of the advantages of string notation, and examples of other represenation schemes, see Hofacker et al. (1995) and Gruner et al. (1995).
The number of possible secondary structures (S) of n bases with k base pairs is given as
A number of strategies for predicting secondary structure have been developed. Gruner et al. provide a taxonomy of folding algorithms, and references for each algorithm. Their table is summarized here:
- Deterministic
- Stochastic
  - Simulated annealing*
* algorithm can predict pseudo-knots
The Waterman algorithm
Now that we can find the minimum free energy structure of a sequence in computationally tractable time, we should ask ``What does the optimum tell us''? That is, there may be more than one structure with the optimum free energy, or there may be many structures within 5% to 10% of the minimum free energy, and these may be topologically very different. A minimum energy folding algorithm will return only one secondary structure, though there are many candidates for the natural structure. To address this, some software packages (such as Zuker's mfold) will display a number of suboptimal folds. Inferring what structure is truly representative of the natural structure requires additional information. Phylogenetic information is often used to constrain the search by identifying highly conserved motifs. Some programs allow the user to specify constraints on the secondary structure, by specifying paired, single-stranded, or non-pairable regions, or by actively participating in the folding process.
Of course, there are a number of limiting assumptions to existing folding algorithms. These include the kinetics of folding during transcription, the difficulty of predicting pseudo-knots, the role of chaperone proteins in folding, and the importance of modified bases (e.g. inosine or methylated bases). Some algorithms attempt to incorporate these considerations (e.g. see Abrahams et al. for predicting pseudo-knots). At best, RNA folding algorithms are first-order approximations used to infer the natural structure of a known sequence.

Related Sites

RNA world at IMB Jena. This page contains links to databases and software, information about meetings, and a number of search utilities.
A list of RNA related sites, compiled by Cambridge University Press.
Image library of biological macromolecules, at IMB Jena has illustrations of molecular structure.
Molecules R US, maintained by NIH, has a fancy interface to structural information in the Brookhaven Protein Database. You can use the interface to view molecules by a number of methods.
Michael Zuker's rna page
M. Zuker's interactive mfold server will fold sequences online.
The Vienna RNA ftp site is located here.
A list of folding software links, for a variety of platforms.
Abstracts at the Institute for Theoretical Chemistry, in Vienna
Abstracts at the Santa Fe Institute

References

Abrahams, J.P., M. van den Berg, E. van Batenburg, and C. Pleij. 1990. Prediction of RNA secondary structure, including pseudo-knotting by computer simulation. Nucleic Acids Research 18:3035-3044. Gesteland, R.F., and J.F. Atkins, eds. 1993. The RNA World. Cold Spring Harbor Laboratory Press. TOC can be found here.
Gruner, W., R. Giegerich, D. Strothmann, C. Reidys, J. Weber, I. Hofacker, P. Stadler, and P. Schuster. Analysis of RNA sequence structure maps by exhaustive enumeration. Santa Fe Institute Preprint 95-10-099. Click here for the abstract or here for a postscript version of the paper.
Jaeger, J.A., D.H. Turner and M. Zuker. 1990. Predicting optimal and suboptimal secondary structure for RNA, in "Molecular Evolution: Computer Analysis of Protein and Nucleic Acid Sequences", R.F. Doolittle ed., Methods in Enzymology 183, 281-306.
McCaskill, J.S. 1990. The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers, 29, 1105-19.
Stolorz, P., M. Huynen, I. Hofacker, and P. Stadler. RNA folding on massively parallel computers. Santa Fe Institute preprint 95-10-089. Click here for the abstract or here for a postscript version of the paper.
Turner, D.H., N. Sugimoto, and S.M. Freier. 1988. RNA structure prediction. Ann. Rev. Biophys. and Biophys. Chem. 17: 167-192.
Waterman, M.S., and T.H. Byers. 1985. A dynamic programming algorithm to find all solutions in a neighborhood of the optimum. Mathematical Biosciences, 77, 179-188.
Williams, A.L., & Tinoco, I.Jr. 1986. A dynamic programming algorithm for finding alternate RNA secondary structures. Nucleic Acids Research, 14, 299-315.
Zuker, M.. 1989. On finding all suboptimal foldings of an RNA molecule. Science 244:48-52.

Lecture notes compiled by P. Hraber, May 1996
Please send comments, additions, and corrections to him.

been busy, realy busy doing bioinformatics

RNA Structure and Prediction
Computational Molecular Biology (BIO502)
M. Nelson and S. Istrail

RNA folding

RNA is transcribed (or synthesized) in cells as single strands of (ribose) nucleic acids. However, these sequences are not simply long strands of nucleotides. Rather, intra-strand base pairing will produce structures such as the one shown below. In RNA, guanine and cytosine pair (GC) by forming a triple hydrogen bond, and adenine and uracil pair (AU) by a double hydrogen bond; additionally, guanine and uracil can form a single hydrogen bond base pair.
The stability of a particular secondary structure is a function of several constraints:
1. The number of GC versus AU and GU base pairs.
  (Higher energy bonds form more stable structures.)
2. The number of base pairs in a stem region.
  (Longer stems result in more bonds.)
3. The number of base pairs in a hairpin loop region.
  (Formation of loops with more than 10 or less than 5 bases requires more energy.)
4. The number of unpaired bases, whether interior loops or bulges.
  (Unpaired bases decrease the stability of the structure.)
The stability of a secondary structure is quantified as the amount of free energy released or used by forming base pairs. Positive free energy requires work to form a configuration; negative free energies release stored work. Free energies are additive, so one can determine the total free energy of a secondary structure by adding all the component free energies (units are kilocalories per mole). The more negative the free energy of a structure, the more likely is formation of that structure, because more stored energy is released. This fact is used to predict the secondary structure of a particular sequence. Discovering a base pair configuration with the minimum possible free energy is the goal of most secondary structure prediction algorithms.
To compute the minimum free energy of a sequence, empirical energy parameters are used. These parameters summarize free energy change (positive or negative) associated with all possible pairing configurations, including base pair stacks and internal base pairs, internal, bulge and hairpin loops, and various motifs which are know to occur with great frequency. Zuker has online tables of free energy and enthalpy values for various motifs.
Four major classes of RNA exist, and can be found in most organisms:
1. mRNA - messenger RNA, is a sequence which codes for formation of one or more proteins.
2. tRNA - transfer RNA, small (~80 bases) sequences which bring amino acids to the ribosome, where they translate mRNA into amino acid sequences.
3. rRNA - ribosomal RNA sequences form ribosomes (along with ribosomal proteins).
  (You can read more about the first three clases by clicking here.)
4. viral RNA (You see some viral RNA structures here and here.)
It is important to note that most RNA folding algorithms predict only secondary, rather than tertiary structure. The three-dimensional shape of the molecule is important to molecular function, but is harder to predict. This is because tertiary structure is know from crystallography for only tRNA sequences (as illustrated at the top of this page). Secondary structure is usually considered a sufficient approximation, until more is know about tertiary structure of RNA.

Predicting RNA secondary structure

Several representations of secondary structure have been utilized, each with different advantages. The planar graph representation shown above gives an intuition for the shape of an RNA sequence, but the same structure could also be represented in string notation. In string notation, balanced parenthesis are used to indicate paired bases, and periods are used to indicate unpaired bases. The secondary structure in the above figure is given as ((((((((((((((....)))))))))))))) in string notation. For a discussion of the advantages of string notation, and examples of other represenation schemes, see Hofacker et al. (1995) and Gruner et al. (1995).
The number of possible secondary structures (S) of n bases with k base pairs is given as
A number of strategies for predicting secondary structure have been developed. Gruner et al. provide a taxonomy of folding algorithms, and references for each algorithm. Their table is summarized here:
- Deterministic
- Stochastic
  - Simulated annealing*
* algorithm can predict pseudo-knots
The Waterman algorithm
Now that we can find the minimum free energy structure of a sequence in computationally tractable time, we should ask ``What does the optimum tell us''? That is, there may be more than one structure with the optimum free energy, or there may be many structures within 5% to 10% of the minimum free energy, and these may be topologically very different. A minimum energy folding algorithm will return only one secondary structure, though there are many candidates for the natural structure. To address this, some software packages (such as Zuker's mfold) will display a number of suboptimal folds. Inferring what structure is truly representative of the natural structure requires additional information. Phylogenetic information is often used to constrain the search by identifying highly conserved motifs. Some programs allow the user to specify constraints on the secondary structure, by specifying paired, single-stranded, or non-pairable regions, or by actively participating in the folding process.
Of course, there are a number of limiting assumptions to existing folding algorithms. These include the kinetics of folding during transcription, the difficulty of predicting pseudo-knots, the role of chaperone proteins in folding, and the importance of modified bases (e.g. inosine or methylated bases). Some algorithms attempt to incorporate these considerations (e.g. see Abrahams et al. for predicting pseudo-knots). At best, RNA folding algorithms are first-order approximations used to infer the natural structure of a known sequence.

Related Sites

RNA world at IMB Jena. This page contains links to databases and software, information about meetings, and a number of search utilities.
A list of RNA related sites, compiled by Cambridge University Press.
Image library of biological macromolecules, at IMB Jena has illustrations of molecular structure.
Molecules R US, maintained by NIH, has a fancy interface to structural information in the Brookhaven Protein Database. You can use the interface to view molecules by a number of methods.
Michael Zuker's rna page
M. Zuker's interactive mfold server will fold sequences online.
The Vienna RNA ftp site is located here.
A list of folding software links, for a variety of platforms.
Abstracts at the Institute for Theoretical Chemistry, in Vienna
Abstracts at the Santa Fe Institute

References

Lecture notes compiled by P. Hraber, May 1996
Please send comments, additions, and corrections to him.

Tuesday, 19 April 2011

understanding bioinformatics

Over the past few decades, major advances in the field of molecular biology, coupled with advances in genomic technologies, have led to an explosive growth in the biological information generated by the scientific community. This deluge of genomic information has, in turn, led to an absolute requirement for computerized databases to store, organize, and index the data and for specialized tools to view and analyze the data.

Wednesday, 6 April 2011

misconceptions about biinformatics

what do you think when a proffesion tells you he/she is a bioinformatician,whats the first thought that crosses your mind?

Friday, 16 December 2011

Thursday, 4 August 2011

Stanford CS221: Introduction to Artificial Intelligence

Professors Sebastian Thrun and Peter Norvig

Overview

Instructors

Who Should Attend?

Course Description

Passing Requirements

Prerequisites

Friday, 8 July 2011

HIV Mutates to Death With New Drug

Wednesday, 29 June 2011

Monday, 30 May 2011

Friday, 20 May 2011

Monday, 9 May 2011

RNA Structure and Prediction Computational Molecular Biology (BIO502) M. Nelson and S. Istrail

RNA Structure and Prediction Computational Molecular Biology (BIO502) M. Nelson and S. Istrail

Tuesday, 19 April 2011

Wednesday, 6 April 2011

RNA Structure and Prediction
Computational Molecular Biology (BIO502)
M. Nelson and S. Istrail

RNA Structure and Prediction
Computational Molecular Biology (BIO502)
M. Nelson and S. Istrail