Monday, September 10, 2007

A number of genetic code diagrams

Bresch and Hausmann took Crick's matrix table of the genetic code, i.e. the decoding instructions of translation in which the information stored in the sequence of nucleic acids is transferred to the sequence of amino acids, and were the first to publish a circular diagram of the code.

However, the arrangements of codon bases is in some way arbitrary. The repetitive motif UCAG separates bases according to size (pyrimidine bases U and C, purine bases A and G fall together), groups the inosine-binding bases U, C and A together and allows for the amino acids methionine and isoleucine to be listed in separate groups. When adhering to a constant string of bases, the UCAG motif offers a higher degree of packing of the code as demonstrated by Serguei Lenski on his homepage, which contains a mathematical approach to packing of the genetic code. His results indicate that listing codons in the order of 2-1-3 and retaining the UCAG motif offers an optimal amount of packing, but that does not mean that there aren't any other ways to represent the code. The physicist Yurij Rumer (1901-1985) preferred the motif CGUA for various reasons, placing more emphasis on the strength of bonds formed between the cognate bases (C-G forming three hydrogen bonds, A-U forming only two, see D. A. Semenov's paper if you have no access Rumer's original publication in Russian: Rumer IuB. [Codon systematization in the genetic code] Dokl Akad Nauk SSSR. 1966 Apr 21;167(6):1393-4.).

I believe that the motif AGCU (or UCGA) when viewed from a circular perspective offers a happy compromise between the two. Furthermore, by placing the second codon base at the centre of the diagram it becomes possible to unite the codons of leucine, serine, arginine and stop and to see codons cluster into groups according to the chemical properties of the respective amino acid: M, I, V, L, F are all hydrophobic; K, N, D, E, Q, H, Y are all hydrophilic, T, S and hydoxy-proline carry hydroxyl groups whilst A is structurally related to S; the last group comprises amino acids at the extremes from the smallest G to the largest W, the most hydrophilic R to the hydrophobe C, however S is structurally related to C and C, U, S and G can be converted into one another biosynthetically. In addition, S is a substrate for W synthesis (for more information on the role of biosynthetic pathways in shaping the genetic code, I would recommend: Wong JT. A co-evolution theory of the genetic code. Proc Natl Acad Sci U S A. 1975 May;72(5):1909-12.). Finally, the rare genetically encoded amino acid pyrrolysine (O) found in members of the Methanosarcinaceae family of archeaea is a lysine derivative contianing a pyrrol ring, which reminds me of the structures found in the amino acids W, Y and H and somehow fits neatly into the scheme with proximity to K, H, Y and W.

The 3D model of the genetic code along 2-1-3 rules allows a number of projections that represent distorted "maps" of the code. In this post are three projections of the standard code as well as four diagrams for variants of the standard code, found in mitochondria, mycoplasma, cilliates and green algae, although these graphs are oversimplified. For the recently evolved genetic code in Candida where an L codon has changed into an S codon it is not possible to combine the codons in a way that allows for all codons to be grouped together according to their respective amino acid. Even so, one can follow the small changes that make such a big difference, with the bases 1 and 2 both remaining pyrimidines in the genetic code of Candida. For details of these and more variants of the standard code I would like to draw your attention to the taxonomy browser at NCBI. For more of my thoughts on the genetic code you can follow this link to Thank you.

Standard code: 1

Standard code: 2

Standard code: 3

Vertebrate Mitochondria Code

Invertebrate Mitochondria Code

Yeast Mitochonria Code
(Codons CGA and CGC absent)


Cilliates and green algae


No comments: