With the print version of my contribution to the Journal of Theoretical Biology made available today, here's a question that has been on my mind for quite some time.
I keep on wondering, in what way the basic components of the genetic code may have evolved in the absence of any translational machinery. In other words, is it possible that precursors to today's tRNAs simply bound to amino acids thus enabling them to accumulate in higher concentrations than would have been possible in the absence of such interactions? tRNA-like molecules that bound to hydrophobic amino acids may have been able to interact with lipids at the liquid/air or at the liquid/solid interphase. With an accumulation of those tRNA-like molecules other tRNA-like molecules that bound to hydrophilic amino acids may have in turn formed aggregates with the hydrophobic amino acid-binding tRNAs via the codon-equivalent regions, leading to the basic dichotomy between N-U-N and N'-A-N' codons and to a microenvironment with favourable conditions for interactions between different species of nucleic and amino acids. The hydrophilic amino acids would have to balance the charge inequalities leading to basic and acidic amino acids landing close codon proximity. The next step would be that RNA molecules with catalytic activity bound these adaptor RNAs at strategic positions and the attached amino acids started to be a part if the catalytic process. Finally, the translational machinery would have evolved and the code would have continued to change with it. Some RNA-amino acid pairings would have been thrown out of the race and new ones joined the process at this stage. A drive to reduce codon ambiguity and to minimize errors in translation are likely to be two important factors in the evolution of the code, but some initial rules laid down by the pre-translation system were already in place before the era of large-scale protein synthesis.
I am thankful for any comments you might care to send my way.
If you are interested in more literature on the genetic code, I can recommend:
Nirenberg, MW, Matthaei, JH, Jones, OW. An intermediate in the biosynthesis of polyphenylalanine directed by synthetic template RNA. Proc Natl Acad Sci U S A. 1962 Jan 15;48:104-9.
Woese, CR. On the evolution of the genetic code. Proc Natl Acad Sci U S A. 1965 Dec;54(6):1546-52.
Crick, FH. The origin of the genetic code. J Mol Biol. 1968 Dec;38(3):367-79.
Orgel, LE. Evolution of the genetic apparatus. J Mol Biol. 1968 Dec;38(3):381-93.
Wong, JT. A co-evolution theory of the genetic code. Proc Natl Acad Sci U S A. 1975 May;72(5):1909-12.
Taylor, FJ, Coates, D. The code within the codons. Biosystems. 1989;22(3):177-87.
Szathmáry, E. The origin of the genetic code: amino acids as cofactors in an RNA world. Trends in Genetics. 1999 June;15(6): 223-9.
Massey, SE. A sequential "2-1-3" model of genetic code evolution that explains codon constraints. J Mol Evol. 2006 Jun;62(6):809-10. Epub 2006 Apr 11.
Showing posts with label Genetic code. Show all posts
Showing posts with label Genetic code. Show all posts
Thursday, August 07, 2008
Friday, May 02, 2008
Nearly there...
Finally, the "Journal of Theoretical Biology" accepted my manuscript for publication as a letter to the editor. The unedited version of the article can be found ahead of print online at
http://dx.doi.org/10.1016/j.jtbi.2008.04.028.
What a relief. I know it's just one little paper and the impact factor of the journal could be higher, but for some reason this paper is incredibly important to me. I don't know if I have made a huge fool of myself or not, but it's too late anyway. I tried to keep the article as short as possible and did not include any acknowledgements, but of course I would not have been able to persevere in this ridiculous struggle had I not had the support of some incredible people along the way. I don't know whether mRNA-tRNA interaction occurs in a 2-1-2-3 way, but I find the sheer oddness of a language that starts with the middle instead of the beginning of an information unit exciting and cannot get the picture of tRNA-precursor molecules forming aggregates with the second codon base acting as an anchor out of my head. Imagine the message
"hteatsiksontosmcuhotseewahtonoenhsayteseenubtottihnwkhtanoonheaysetthuogthaobutthtawihcehvreyobdsyese."
being translated as "thetaskisnotsomuchtoseewhatnoonehasyetseenbuttothinkwhatnoonehasyetthoughtaboutthatwhicheverybodysees." and finally for the information to "fold" into the beautiful quote: "The task is not so much to see what no one has yet seen but to think what no one has yet thought, about that which everybody sees."
Of course if the interaction between mRNA and tRNA occurred in a 1-2-3-way, the importance of the second base might be explained by a longer interval spent reading the base, so the order would roughly be 1-2-2-3. If the binding of codon and anticodon happened in such a way that all three bases bound simultaneously, then the base in the middle might spend the longest time bound to its cognate because the bases 5' and 3' of it would act as a sort of buffer (like velcro, the middle bit usually the hardest to get off first).
With possible closure on the rearrangement of the genetic code in sight, I wonder if somebody else might pick up on the methionine story. Perhaps one group might try to engineer an organism that uses an initiator-tRNA charged with isoleucine, valine, threonine or homocysteine instead of methionine and report on what the resulting phenotype might be...
But all of this has merely been a side project, the biggest reward for me would be if my initial hypothesis about the involvement of S-adenosylmethionine in determining resistance or susceptibility in experimental leishmaniasis, might prove to be useful.
http://dx.doi.org/10.1016/j.jtbi.2008.04.028.
What a relief. I know it's just one little paper and the impact factor of the journal could be higher, but for some reason this paper is incredibly important to me. I don't know if I have made a huge fool of myself or not, but it's too late anyway. I tried to keep the article as short as possible and did not include any acknowledgements, but of course I would not have been able to persevere in this ridiculous struggle had I not had the support of some incredible people along the way. I don't know whether mRNA-tRNA interaction occurs in a 2-1-2-3 way, but I find the sheer oddness of a language that starts with the middle instead of the beginning of an information unit exciting and cannot get the picture of tRNA-precursor molecules forming aggregates with the second codon base acting as an anchor out of my head. Imagine the message
"hteatsiksontosmcuhotseewahtonoenhsayteseenubtottihnwkhtanoonheaysetthuogthaobutthtawihcehvreyobdsyese."
being translated as "thetaskisnotsomuchtoseewhatnoonehasyetseenbuttothinkwhatnoonehasyetthoughtaboutthatwhicheverybodysees." and finally for the information to "fold" into the beautiful quote: "The task is not so much to see what no one has yet seen but to think what no one has yet thought, about that which everybody sees."
Of course if the interaction between mRNA and tRNA occurred in a 1-2-3-way, the importance of the second base might be explained by a longer interval spent reading the base, so the order would roughly be 1-2-2-3. If the binding of codon and anticodon happened in such a way that all three bases bound simultaneously, then the base in the middle might spend the longest time bound to its cognate because the bases 5' and 3' of it would act as a sort of buffer (like velcro, the middle bit usually the hardest to get off first).
With possible closure on the rearrangement of the genetic code in sight, I wonder if somebody else might pick up on the methionine story. Perhaps one group might try to engineer an organism that uses an initiator-tRNA charged with isoleucine, valine, threonine or homocysteine instead of methionine and report on what the resulting phenotype might be...
But all of this has merely been a side project, the biggest reward for me would be if my initial hypothesis about the involvement of S-adenosylmethionine in determining resistance or susceptibility in experimental leishmaniasis, might prove to be useful.
Sunday, September 23, 2007
Selenocysteine and Pyrrolysine
More a postscriptum than a real post, but isn't it amazing, how selenocysteine and pyrrolysine, the 21st and 22nd proteinaceous amino acids, fit into the 2-1-3 genetic code scheme?
Selenocysteine (Sec / U): mRNA codon UGA. Related to cysteine, is in the same subgroup as tryptophane and cysteine and is in the same group as serine.
Pyrrolysine (Pyl / O): mRNA codon UAG. Consisting of a lysine backbone and a pyrrol ring containing a pi electron pair just as in tyrosine, histidine, tryptophane and phenylalanine, is in the same subgroup as tyrosine, and in the same group as histidine and lysine.
It's almost spooky, don't you think?
Selenocysteine (Sec / U): mRNA codon UGA. Related to cysteine, is in the same subgroup as tryptophane and cysteine and is in the same group as serine.
Pyrrolysine (Pyl / O): mRNA codon UAG. Consisting of a lysine backbone and a pyrrol ring containing a pi electron pair just as in tyrosine, histidine, tryptophane and phenylalanine, is in the same subgroup as tyrosine, and in the same group as histidine and lysine.
It's almost spooky, don't you think?
Monday, September 10, 2007
A number of genetic code diagrams
Bresch and Hausmann took Crick's matrix table of the genetic code, i.e. the decoding instructions of translation in which the information stored in the sequence of nucleic acids is transferred to the sequence of amino acids, and were the first to publish a circular diagram of the code.

However, the arrangements of codon bases is in some way arbitrary. The repetitive motif UCAG separates bases according to size (pyrimidine bases U and C, purine bases A and G fall together), groups the inosine-binding bases U, C and A together and allows for the amino acids methionine and isoleucine to be listed in separate groups. When adhering to a constant string of bases, the UCAG motif offers a higher degree of packing of the code as demonstrated by Serguei Lenski on his homepage, which contains a mathematical approach to packing of the genetic code. His results indicate that listing codons in the order of 2-1-3 and retaining the UCAG motif offers an optimal amount of packing, but that does not mean that there aren't any other ways to represent the code. The physicist Yurij Rumer (1901-1985) preferred the motif CGUA for various reasons, placing more emphasis on the strength of bonds formed between the cognate bases (C-G forming three hydrogen bonds, A-U forming only two, see D. A. Semenov's paper if you have no access Rumer's original publication in Russian: Rumer IuB. [Codon systematization in the genetic code] Dokl Akad Nauk SSSR. 1966 Apr 21;167(6):1393-4.).
I believe that the motif AGCU (or UCGA) when viewed from a circular perspective offers a happy compromise between the two. Furthermore, by placing the second codon base at the centre of the diagram it becomes possible to unite the codons of leucine, serine, arginine and stop and to see codons cluster into groups according to the chemical properties of the respective amino acid: M, I, V, L, F are all hydrophobic; K, N, D, E, Q, H, Y are all hydrophilic, T, S and hydoxy-proline carry hydroxyl groups whilst A is structurally related to S; the last group comprises amino acids at the extremes from the smallest G to the largest W, the most hydrophilic R to the hydrophobe C, however S is structurally related to C and C, U, S and G can be converted into one another biosynthetically. In addition, S is a substrate for W synthesis (for more information on the role of biosynthetic pathways in shaping the genetic code, I would recommend: Wong JT. A co-evolution theory of the genetic code. Proc Natl Acad Sci U S A. 1975 May;72(5):1909-12.). Finally, the rare genetically encoded amino acid pyrrolysine (O) found in members of the Methanosarcinaceae family of archeaea is a lysine derivative contianing a pyrrol ring, which reminds me of the structures found in the amino acids W, Y and H and somehow fits neatly into the scheme with proximity to K, H, Y and W.

The 3D model of the genetic code along 2-1-3 rules allows a number of projections that represent distorted "maps" of the code. In this post are three projections of the standard code as well as four diagrams for variants of the standard code, found in mitochondria, mycoplasma, cilliates and green algae, although these graphs are oversimplified. For the recently evolved genetic code in Candida where an L codon has changed into an S codon it is not possible to combine the codons in a way that allows for all codons to be grouped together according to their respective amino acid. Even so, one can follow the small changes that make such a big difference, with the bases 1 and 2 both remaining pyrimidines in the genetic code of Candida. For details of these and more variants of the standard code I would like to draw your attention to the taxonomy browser at NCBI. For more of my thoughts on the genetic code you can follow this link to blog.rna-game.org. Thank you.
Standard code: 1

Standard code: 2

Standard code: 3

Vertebrate Mitochondria Code

Invertebrate Mitochondria Code

Yeast Mitochonria Code
(Codons CGA and CGC absent)

Mycoplasma

Cilliates and green algae

Cilliates


However, the arrangements of codon bases is in some way arbitrary. The repetitive motif UCAG separates bases according to size (pyrimidine bases U and C, purine bases A and G fall together), groups the inosine-binding bases U, C and A together and allows for the amino acids methionine and isoleucine to be listed in separate groups. When adhering to a constant string of bases, the UCAG motif offers a higher degree of packing of the code as demonstrated by Serguei Lenski on his homepage, which contains a mathematical approach to packing of the genetic code. His results indicate that listing codons in the order of 2-1-3 and retaining the UCAG motif offers an optimal amount of packing, but that does not mean that there aren't any other ways to represent the code. The physicist Yurij Rumer (1901-1985) preferred the motif CGUA for various reasons, placing more emphasis on the strength of bonds formed between the cognate bases (C-G forming three hydrogen bonds, A-U forming only two, see D. A. Semenov's paper if you have no access Rumer's original publication in Russian: Rumer IuB. [Codon systematization in the genetic code] Dokl Akad Nauk SSSR. 1966 Apr 21;167(6):1393-4.).
I believe that the motif AGCU (or UCGA) when viewed from a circular perspective offers a happy compromise between the two. Furthermore, by placing the second codon base at the centre of the diagram it becomes possible to unite the codons of leucine, serine, arginine and stop and to see codons cluster into groups according to the chemical properties of the respective amino acid: M, I, V, L, F are all hydrophobic; K, N, D, E, Q, H, Y are all hydrophilic, T, S and hydoxy-proline carry hydroxyl groups whilst A is structurally related to S; the last group comprises amino acids at the extremes from the smallest G to the largest W, the most hydrophilic R to the hydrophobe C, however S is structurally related to C and C, U, S and G can be converted into one another biosynthetically. In addition, S is a substrate for W synthesis (for more information on the role of biosynthetic pathways in shaping the genetic code, I would recommend: Wong JT. A co-evolution theory of the genetic code. Proc Natl Acad Sci U S A. 1975 May;72(5):1909-12.). Finally, the rare genetically encoded amino acid pyrrolysine (O) found in members of the Methanosarcinaceae family of archeaea is a lysine derivative contianing a pyrrol ring, which reminds me of the structures found in the amino acids W, Y and H and somehow fits neatly into the scheme with proximity to K, H, Y and W.

The 3D model of the genetic code along 2-1-3 rules allows a number of projections that represent distorted "maps" of the code. In this post are three projections of the standard code as well as four diagrams for variants of the standard code, found in mitochondria, mycoplasma, cilliates and green algae, although these graphs are oversimplified. For the recently evolved genetic code in Candida where an L codon has changed into an S codon it is not possible to combine the codons in a way that allows for all codons to be grouped together according to their respective amino acid. Even so, one can follow the small changes that make such a big difference, with the bases 1 and 2 both remaining pyrimidines in the genetic code of Candida. For details of these and more variants of the standard code I would like to draw your attention to the taxonomy browser at NCBI. For more of my thoughts on the genetic code you can follow this link to blog.rna-game.org. Thank you.
Standard code: 1
Standard code: 2
Standard code: 3
Vertebrate Mitochondria Code
Invertebrate Mitochondria Code
Yeast Mitochonria Code
(Codons CGA and CGC absent)
Mycoplasma
Cilliates and green algae
Cilliates
Thursday, August 23, 2007
Another model
When I set out to do realign the genetic code, it was almost like playing a game of sudoku. The intention was to see if there was a way to simplify the way the code was represented as an answer to some who claimed that the code was too complex to have evolved on its own. Now that it looks so shockingly simple, some might claim that it is so perfectly simple that only a designer could have made it. This I cannot agree with, it's not perfect whatever that may mean, but simple physical and chemical forces appear to sufficiently explain the evolution of the code based on the material that was available at the time. There are countless little and big influences that made me have a go at the code, but Tetris and Calder are sure to have played a role in this...
I would suggest that a three-dimensional representation would be an even better model. All you need if you want to build a model yourself at home is a couple of molecular building block kits. Make sure that you have blocks that allow for tetra- and penta-valent binding (e.g. carbon atoms in normal and transitional state).
Start with a tetravalent-binding block at the centre to get the tetrahedrical base, use the binding elements to represent A, G, C, and U and continue with the pentavalent-binding blocks until you reach the level of amino acids where you then can use different coloured blocks to symbolise the individual amino acids.
This approach lets you bring amino acid codons that seemed at opposite poles into close proximity (e.g. K and R or F and Y). I know, it may look confusing to begin with, but it adds another layer of information. The categories for grouping the amino acid codons on www.rna-game.org are after all somewhat subjective (non-polar, transitional, special, polar), maybe there are other categories that should be used to group the codons.
I guess what I mean to say is, have fun. Of course sincerety and truth remain the essence of science and empirical and rational thinking are the backbone of this line of work, but intuition and a sense of wonder are crucial elements as well.

The graphs on www.rna-game.org and blog.rna-game.org are distorted two-dimensional maps of the genetic code in three dimensions as shown above.
I would suggest that a three-dimensional representation would be an even better model. All you need if you want to build a model yourself at home is a couple of molecular building block kits. Make sure that you have blocks that allow for tetra- and penta-valent binding (e.g. carbon atoms in normal and transitional state).
Start with a tetravalent-binding block at the centre to get the tetrahedrical base, use the binding elements to represent A, G, C, and U and continue with the pentavalent-binding blocks until you reach the level of amino acids where you then can use different coloured blocks to symbolise the individual amino acids.
This approach lets you bring amino acid codons that seemed at opposite poles into close proximity (e.g. K and R or F and Y). I know, it may look confusing to begin with, but it adds another layer of information. The categories for grouping the amino acid codons on www.rna-game.org are after all somewhat subjective (non-polar, transitional, special, polar), maybe there are other categories that should be used to group the codons.
I guess what I mean to say is, have fun. Of course sincerety and truth remain the essence of science and empirical and rational thinking are the backbone of this line of work, but intuition and a sense of wonder are crucial elements as well.
The graphs on www.rna-game.org and blog.rna-game.org are distorted two-dimensional maps of the genetic code in three dimensions as shown above.

Subscribe to:
Posts (Atom)