Standard Article

You have free access to this content

Genetic Code: Introduction

  1. Kimitsuna Watanabe

Published Online: 24 OCT 2002

DOI: 10.1038/npg.els.0000809

eLS

eLS

How to Cite

Watanabe, K. 2002. Genetic Code: Introduction. eLS. .

Author Information

  1. University of Tokyo, Tokyo, Japan

Publication History

  1. Published Online: 24 OCT 2002

This is not the most recent version of the article. View current version (17 OCT 2011)

Historical Background to Breaking the Code

  1. Top of page
  2. Historical Background to Breaking the Code
  3. Topology of the Code as Revealed by Frameshift Mutations
  4. The Code Decoded Using Protein Synthesis In Vitro
  5. Termination and Initiation Codons
  6. In Vivo Code
  7. The Universal Genetic Code
  8. Progress after the Cold Spring Harbor Symposium
  9. Further Reading

The ‘DNA double helix’ model proposed by James Watson and Francis Crick in 1953 demonstrated that a gene consists of a stretch of double-stranded DNA (deoxyribonucleic acid), which can replicate itself according to a complementary base-pairing rule. This landmark discovery immediately raised the question as to how DNA, which constitutes the gene, is able to direct the production of proteins: that is, how information stored in nucleic acid is expressed as information contained in proteins. Because this was essentially a coding problem, the correlation between nucleic acid bases and the corresponding amino acid in the protein was named the ‘genetic code’. The field of molecular biology can be said to have developed largely as a result of the dedicated work that went into elucidating this code. See also Double Helix: Discovery and Properties, DNA Structure: A-, B- and Z-DNA Helix Families, Watson, James Dewey, and Crick, Francis Harry Compton

Shortly after the structure of the DNA double helix was deduced, a theoretical physicist, George Gamow, arrived at a hypothesis that attempted to resolve the coding problem. He postulated that in protein synthesis each amino acid is inserted into a specific cavity created around a complementary pair of nucleotide bases, and polymerization then takes place along and around the rotational axis of the DNA double helix. Gamow envisaged the cavities for amino acid insertion as being diamond-shaped, bounded by a base at each of the four corners. If we consider the helix to be vertically oriented, then the left and right corners of the diamond comprise a complementary base pair – either adenine (A) and thymine (T), or guanine (G) and cytosine (C). The bases at the top and bottom corners are on the same strand and consist of one base from the adjacent base pairs immediately above and below the complementary pair. Different combinations of top and bottom bases with the left and right complementary pairs would give a total of 20 cavity variations, each being specific for the insertion of one of the 20 amino acids used as building blocks for proteins. See also Nucleotides: Structure and Properties

Although Gamow's hypothesis contributed to the eventual deciphering of the genetic code by introducing the concept that the nucleic acid sequence acts as a template for assembling amino acids into proteins, it was soon disproved by several experimental observations. First, because DNA has a rotational axis perpendicular to the double helix axis it has no specific direction, which means the sequence orientation cannot be determined. Second, it seemed highly unlikely that diamond-shaped cavities could structurally differentiate 20 amino acids with various side-chains. Third, experimental evidence already existed that proteins are produced in the cytoplasm, not in the nucleus where DNA is stored. Fourth, a number of scientists suspected that RNA (ribonucleic acid) is a direct mediator of the transfer of genetic information from DNA to proteins. Fifth, adjacent diamond-shaped cavities shared a nucleotide between them. This overlapping arrangement would inevitably limit which amino acids could adjoin each other sequentially in synthesized proteins; however, no such constraints were apparent.

Around the same time, Gamow also founded ‘RNA Tie Club’, the aim of which was ‘to solve the riddle of RNA structure and to understand the way it builds proteins’. The club had 20 regular members, each of whom wore an ‘RNA necktie’ representing one of the 20 amino acids used in proteins (Ala was assigned to Gamow and Tyr to Crick). Members presented unpublished reports, known as ‘Notes for the RNA Tie Club’, that examined a variety of ideas and working hypotheses, many of which were later substantiated and greatly influenced experimental work. The systematic study of the genetic code that began with the activities of the RNA Tie Club eventually led to the concept of the triplet codon and its experimental verification.

In 1956, Crick postulated the ‘comma-less code’, in which each amino acid is specifically assigned to a corresponding codon and the triplets do not overlap. Such a code is ‘nondegenerate’ (i.e. each amino acid is encoded by only one codon). To explain the correlation between the resulting 64 codons and the 20 amino acids used in proteins, Crick differentiated a set of 20 ‘sense codons’ corresponding to amino acids; the remaining 44 triplets, which were considered not meaningful, were termed ‘nonsense codons’. First, the four single-base triplets (UUU, CCC, AAA and GGG) were classified as nonsense codons because tandem repeats of these codons would not allow coding for tandem sequences of the same amino acid. Among the remaining 60 codons, each unique triplet was classified as a sense codon and the two derivative codons that would occur when the triplet sequence was repeated were deemed nonsense codons. For example, taking UAC as the unique triplet, repeating this sequence gives UACUAC. In this case, the codons ACU and CUA that occur in the sequence are derivative, and thus nonsense, codons. This process results in just 20 sense codons, which matches the total number of amino acid contained in proteins.

Crick published the above proposal in 1958 in a paper entitled ‘On protein synthesis’, in which two other important hypotheses were also discussed. One was the ‘sequence hypothesis’, which asserted that the specificity of nucleic acids is entirely expressed in the nucleotide sequence, and that this sequence serves as the signal to determine the amino acid sequence. This apparent colinearity between nucleotide sequences of DNA and amino acid sequences of proteins was experimentally supported by studies on sickle cell anaemia carried out by Linus Pauling and Vernon Ingram, as well as by Seymour Benzer's research on the detailed gene structure of T4 phage (a bacteriophage that infects the bacterium Escherichia coli) and the relationship between its mutant and altered mutant proteins. The second hypothesis proposed by Crick was what he termed the ‘Central Dogma’ of molecular biology, which stated that genetic information flows unidirectionally from DNA to protein via RNA as an intermediary. Thus proteins are the final products of gene expression. This idea served as the guideline for subsequent studies on gene expression, particularly in elucidating the molecular mechanism of protein synthesis. See also Pauling, Linus Carl, Molecular Biology: The Central Dogma, and Commelinales

Topology of the Code as Revealed by Frameshift Mutations

  1. Top of page
  2. Historical Background to Breaking the Code
  3. Topology of the Code as Revealed by Frameshift Mutations
  4. The Code Decoded Using Protein Synthesis In Vitro
  5. Termination and Initiation Codons
  6. In Vivo Code
  7. The Universal Genetic Code
  8. Progress after the Cold Spring Harbor Symposium
  9. Further Reading

For several years following Crick's proposal of a comma-less code, which was a product of pure theoretical brainstorming, no experimental evidence was forthcoming to support or refute it. Unexpectedly, the breakthrough in solving the ‘coding problem’ came in a contribution from a quite different field – genetic experiments using the T4 bacteriophage. In a study published in 1961, Crick, Sydney Brenner and co-workers provided clear evidence on how the genetic code is read. They had already observed that when E. coli strain B was infected by T4 phage with a mutation in its rII region, a distinctive large plaque (zone of infection) indicative of phage growth was obtained, whereas no such plaque was evident when E. coli strain K12 was infected by the same mutant phage. However, both the wild-type and revertant T4 phages (a revertant is a mutant that has reverted to the wild type) could grow on a medium containing either E. coli strain B or strain K12. By this selective plating technique, a T4 phage mutant and its revertant can easily be distinguished. In their experiments, Crick and colleagues used proflavin, a mutagen that has the ability to induce base insertions and/or deletions during DNA replication by intercalating into adjacent bases. They first used proflavin to induce a mutation in the T4 phage rII cistron (a cistron is a region of DNA similar to a gene); the resultant mutant (called the frameshift mutant) was named strain FC0. By further treating the frameshift mutant with proflavin, a revertant (called the suppressor mutant) was obtained. Genetic analysis revealed that this suppressor mutant possessed a second mutation very close to the mutation site of strain FC0 (Figure 1). When the first mutation is an insertion (+) and the second is a deletion (−) at a point very near the first mutation site, then the revertant's suppressive mutation is (+, −) since the original coding frame is restored by the second mutation. This type of suppressor mutant is referred to as an insertion-suppressed revertant strain. Crick and co-workers isolated various other deletion mutants, such as FC7 and FC9, by recombining the insertion-suppressed revertant strain with the wild type. Similarly, various insertion mutants were obtained by recombining a deletion-suppressed revertant strain with the wild type. These isolated mutant strains were then further recombined in various ways to construct a series of recombinant strains with double and triple mutations, which were observed to have the following characteristic features:

  1. No rIIB cistron activity was observed in either the (+) or (−) single-mutation strains. However, recombinant strains with (+, −) or (−, +) mutations recovered the rIIB phenotype, but only when the loci of the two mutations were very close to each other.

  2. Recombinant strains with (−, −) or (+, +) mutations were inactive, but triple mutants with mutations of the same type, i.e. (+, +, +) or (−, −, −), recovered the rIIB phenotype. Recombinant strains with triple mutations of different types, for example (+, +, −) or (+, −, −), showed no rIIB activity.

See also Brenner, Sydney, Bacteriophage T4, Gene Expression: Frameshifting, and Mutations and the Genetic Code

thumbnail image

Figure 1. Gene map of frameshift mutants of T4 phage rIIB cistron caused by proflavin. Modified from Crick FHC, Barnett L, Brenner S and Watts-Tobin RJ (1961) Nature 192: 1227–1232.

These results can be interpreted by referring to Figure 2. The messenger RNA (mRNA) is read with the defined reading frame up to the first mutation point, but from there the frame is shifted one base downstream or upstream by a (+) or (−) mutation, respectively. If the second mutation that suppresses the first occurs downstream in the close vicinity of the first mutation point, the correct reading frame is restored from this second mutation point. If the region between the first and second mutations (in which the reading frame is shifted) is short, the mutation combination has a minimum effect on the protein synthesis process and functional proteins are produced; a mutant with this combination shows the wild-type phenotype. This experimental finding immediately disproved Crick's idea of a comma-less code – if it held, a nonsense codon would arise at any insertion (+) or deletion (−) mutation site, leading to the termination of protein synthesis. Why do the triple mutants (+, +, +) and (−, −, −) manifest the wild-type phenotype? The answer is because the original coding frame is recovered by the insertion or deletion of three bases at different loci. Frameshift mutations in T4 phage thus revealed the following fundamental intrinsic characteristics of the genetic code: the code is read from a defined point with a reading frame made up of three-base combinations, and it is not overlapping but degenerate (i.e. there is more than one codon for most amino acids). See also Messenger RNA in Prokaryotes, Messenger RNA in Eukaryotes, and Messenger RNA: Interaction with Ribosomes

thumbnail image

Figure 2. Frameshift mutants of T4 phage rIIB cistron caused by proflavin. Addition and deletion on the nucleotide sequence. Redrawn from Crick FHC, Barnett L, Brenner S and Watts-Tobin RJ (1961) Nature 192: 1227–1232.

The Code Decoded Using Protein Synthesis In Vitro

  1. Top of page
  2. Historical Background to Breaking the Code
  3. Topology of the Code as Revealed by Frameshift Mutations
  4. The Code Decoded Using Protein Synthesis In Vitro
  5. Termination and Initiation Codons
  6. In Vivo Code
  7. The Universal Genetic Code
  8. Progress after the Cold Spring Harbor Symposium
  9. Further Reading

It was also in 1961 that Marshall Nirenberg and Johann Matthaei succeeded for the first time in assigning an element of the genetic code by using a cell-free E. coli protein synthesis system containing ribosomes, S100 fraction, transfer RNA (tRNA), amino acids, adenosine triphosphate (ATP), an energy-recycling system and a template. In this cell-free system, a defined isotope-labelled amino acid and the other 19 unlabelled amino acids were incubated with a template – either native DNA, RNA or synthetic polyribonucleotides. Selective incorporation of an amino acid was observed by isolation and analysis of aggregated proteins, in the form of the acid-insoluble fraction, in the protein synthesis system. As a result, Nirenberg and Matthaei discovered that adding poly(U) – a long chain of repeating uridylic acid units – to the cell-free system enhanced the incorporation of Phe, demonstrating that in the genetic code UUU corresponds to the amino acid phenylalanine (Phe) (Table 1). This finding was reported at the International Biochemical Congress held in Moscow in 1961 and the news rapidly spread around the world. By that time, Nirenberg's group had already succeeded in elucidating a second element of the genetic code: CCC codes for proline (Pro). See also Nirenberg, Marshall Warren

Table 1. Poly(U)-dependent poly(Phe) synthesis
ConditionIncorporation of [14C]Phe (counts/min)
  1. a

    The complete system comprises ribosome, tRNA, poly(U), [14C]Phe, 19 other unlabelled amino acids, S100 (a mixture of enzymes necessary for protein synthesis), ATP, and the energy regenerating system in the buffer.

    Phe, phenylalanine.

    Modified from Nirenberg MW and Matthaei JH (1961) Proceedings of the National Academy of Sciences of the USA 47: 1588–1602.

Completea29 500
− poly(U)70
− ribosome52
− S100106
− ATP 
− energy regenerating system83
+ RNAase120
+ DNAase27 600

As soon as news of the breakthrough in deciphering the code reached Severo Ochoa's laboratory, his group began synthesizing various RNA copolymers in which two kinds of nucleotide were mixed at fixed ratios; they then examined the composition of the codon triplets. An example is shown in Table 2. The copolymer Poly(5A1C), synthesized by mixing five parts of A and one part of C, contained A and C in a 5 : 1 ratio but in a random order. Ochoa's analysis showed that poly(5A1C) enhanced the incorporation of the amino acids Asp, Gln, His, Lys, Pro and Thr. Statistically, the three triplets in the copolymer made up of 2A1C (AAC, ACA and CAA) could be expected to enhance the incorporation of the relevant amino acids by a proportion one-fifth that of AAA (Lys). Similarly, the triplets consisting of 1A2C could be expected to incorporate the relevant amino acids by a proportion 1/25 that of AAA (Lys). As shown in Table 2, a good relationship was observed between the theoretical and experimental values, demonstrating that codon triplets consisting of 2A1C are likely to code for Asp, Gln and Thr, while those comprising 1A2C would code for His, Pro and Thr. Nirenberg's group carried out similar experiments, and by 1964 the nucleotide compositions of all the codon triplets corresponding to all 20 amino acids were determined. However, the problem of matching each specific codon sequence to the corresponding amino acid remained. See also Ochoa, Severo

Table 2. Incorporation of amino acids directed by poly(5A1C) as an artificial mRNA in the in vitro protein synthesis system
Amino acidRelative amount of amino acid incorporation (experimentally obtained)Assumed codonCalculated frequency of occurrence of a tripletSum of occurrence frequencies of triplets
   3A2A1C1A2C3C 
Modified from Speyer JF, Lengyel P, Basilio C et al. (1963) Cold Spring Harbor Symposia on Quantitative Biology 28: 559–567.
Asp24.22A1C 20  20
Gln23.72A1C 20  20
His6.51A2C  4.0 4
Lys1003A100   100
Pro7.21A2C, 3C  4.00.84.8
Thr26.52A1C, 1A2C 204.0 24

This matching problem was also solved by Nirenberg's group. They designed a novel experiment utilizing the finding that when a triplet codon sequence is added to a cell-free system, ribosomes specifically bind to the aminoacyl-tRNA possessing an anticodon complementary to the codon sequence added. By trapping the resulting complex (i.e. ribosomes attached to the aminoacyl-tRNA and the codon sequence) on a nitrocellulose membrane and then identifying the amino acid attached to the tRNA, each codon was matched with its respective amino acid. As a result of this second breakthrough, the nucleotide sequences of most triplet codons were unambiguously assigned (Table 3). Nirenberg's group also experimentally demonstrated for the first time that the minimal unit of a codon is a triplet by confirming that specific binding of aminoacyl-tRNA to ribosomes was observed only when a template oligonucleotide longer than trinucleotide was added to the ribosome system.

Table 3. Binding of aminoacyl-tRNA to ribosomes stimulated by trinucleotides
TrinucleotidesAminoacyl-tRNA
UUU, UUCPhe
UUA, UUG, CUU, CUC, CUA, CUGLeu
AAU, AUC, AUAIle
AUGMet
GUU, GUC, GUA, GUGVal
UCU, UCC, UCA, UCGSer
CCU, CCC, CCA, CCGPro
AAA, AAGLys
UGU, UGCCys
GAA, GAGGlu

At almost the same time, Har Gobind Khorana's group developed a different method to determine the nucleotide sequences of codon triplets. They succeeded in synthesizing polyribonucleotides with defined repeated sequences using RNA polymerase and a synthetic DNA template constructed from DNA polymerase and a short, organochemically produced DNA fragment with a defined sequence. In this approach, copoly(AG) was synthesized with a sequence of alternating A and G nucleotides. Copoly(UC), copoly(AGU) and other sequences were similarly synthesized. It was found that copoly(AG) and copoly(UC) produced polypeptides with alternating amino acids Arg and Glu, copoly(Arg-Glu), and with alternating Ser and Leu, copoly(Ser-Leu), respectively. Information gained from this experiment combined with the already established relationships between amino acids and the nucleotide compositions of corresponding codons enabled AGA, GAG, UCU and CUC to be respectively matched with Arg, Glu, Ser and Leu. Thus, 61 codon triplets were unambiguously assigned to 20 amino acids (Table 4). The remaining three triplets were determined to be termination codons as described below. See also Khorana, Har Gobind

Table 4. Incorporation of amino acids stimulated by artificial mRNAs with alternating sequences
PolynucleotideSynthesized polypeptidesAssigned codons
poly(UC)poly(Ser-Leu)UCU: Ser, CUC: Leu
poly(AG)poly(Arg-Glu)AGA: Arg, GAG: Glu
poly(UG)poly(Val-Cys)GUG: Val, UGU: Cys
poly(AC)poly(Thr-His)ACA: Thr, CAC: His
poly(UAC)poly(Tyr), poly(Thr), poly(Leu)UAC: Tyr, ACU: Thr, CUA, Leu
poly(GUA)poly(Val), poly(Ser)GUA: Val, AGU: Ser, (UAG:stop)
poly(UUG)poly(Leu), poly(Cys), poly(Val)UUG: Leu, UGU: Cys, GUU: Val
poly(UAUC)poly(Tyr-Leu-Ser-Ile)UAU: Tyr, CUA: Leu, AUC: Ile, UCU: Ser
poly(UUAC)poly(Leu-Leu-Thr-Tyr)UUA: Leu, UAC: Tyr, ACU: Thr, CUU: Leu

Termination and Initiation Codons

  1. Top of page
  2. Historical Background to Breaking the Code
  3. Topology of the Code as Revealed by Frameshift Mutations
  4. The Code Decoded Using Protein Synthesis In Vitro
  5. Termination and Initiation Codons
  6. In Vivo Code
  7. The Universal Genetic Code
  8. Progress after the Cold Spring Harbor Symposium
  9. Further Reading

Termination codons were resolved by molecular genetic studies of T4 phage nonsense mutants. These mutants are unable to grow in E. coli cells because synthesis of T4 proteins is terminated at the nonsense mutation sites and as a result only premature proteins are synthesized by ribosomes in abortive translation. It was found, however, that T4 phage nonsense mutants could grow in an E. coli strain called suppressor mutant Su+. This Su+ strain was named the ‘amber suppressor’ after Harris Bernstein, a graduate student in Richard Epstein's laboratory who was involved in the discovery of the T4 phage mutants (Bernstein is the German for ‘amber’). Brenner's group treated T4 phage with a mutagen and used the resultant mutant to infect the E. coli wild-type strain (Su) or the suppressor mutant strain (Su+). By selectively collecting T4 phages that grew in Su+ but not in Su they succeeded in isolating the desired mutant T4 phage.

When the head protein of the mutant phage grown in wild-type Su and mutant Su+ E. coli cells was sequenced, the amino acid sequence of the protein obtained from the Su+ cells was identical to that of the wild-type T4 phage-infected cells except for the replacement of a single amino acid: Gln in the wild-type T4 phage-infected cells was replaced by Ser in the Su+ protein. The protein from the Su cells comprised a premature fragment whose sequence was identical to that of the wild type up to the amino acid immediately before Gln (Table 5). As such a mutation would have resulted from a mutation at a single base, it was very likely that a single base mutation in the Gln codon in the wild-type cells resulted in a termination codon in the Su cells. Similarly, another single base mutation in the termination codon would have given rise to the Ser codon in Su+ cells. When the T4 phage mutant was used to infect different Su+ strains and the head proteins of the resultant T4 phages were sequenced, several amino acids were found to be inserted into the position originally occupied by Gln in the wild-type protein (Figure 3). Given that a single base mutation is the most plausible mechanism to explain these phenomena, it was concluded that the termination codon found in Su cells must be UAG. Alan Garen's group also arrived at the same conclusion in experiments using amber mutants of E. coli alkaline phosphatase.

thumbnail image

Figure 3. Replacement of the amber codon with amino acids in the head protein of T4 phage amber mutant grown in various Escherichia coli Su+ strains.

Table 5. Comparison of amino acid sequences of T4 phage head protein between a wild-type phage and its mutant phage H36 grown in either Svu or Su+ E. coli strain
  1. Modified from Brenner S, Strelton AO and Kaplan S (1965) Nature 206: 994–998.

inline image

Later, two other nonsense mutations of T4 phage were also isolated, and E. coli mutants that could suppress these mutations were concomitantly found. In a similar manner to that described above, the nonsense codons responsible for these mutations were identified as UAA and UGA, which were named ‘ochre’ and ‘opal’ respectively. These three codons (UAG, UAA and UGA) that code for no amino acid but instead cause protein synthesis to terminate were named ‘termination’ (or ‘stop’) codons.

In 1965, Mario Capecchi's group demonstrated that termination codons are also able to function in an in vitro protein synthesis system. They used an R17 phage amber mutant in which the codon coding for the seventh amino acid of the coat protein, Glu, was replaced by the termination codon UAG. When R17 mutant RNA was translated as a template in an in vitro translation system prepared from the E. coli Su strain, a peptide consisting of six amino acids was produced. On the other hand, when it was translated in a system prepared from the E. coli Su+ strain, a full-length coat protein was produced with Ser replacing Glu at the seventh position. This result demonstrated that the UAG amber codon does indeed serve as a terminator of protein synthesis.

Further investigation of Su and Su+ E. coli strains also revealed that tRNA is responsible for this suppression mechanism. The tRNA identified with the suppressor function was named ‘suppressor tRNA’. Howard Goodman was the first to sequence the amber suppressor tRNATyr (in 1965). He found that the anticodon of the suppressor tRNA was changed to CUA, allowing base pairing with UAG of the amber codon. For ochre suppressor tRNAGln with the UAA codon and opal suppressor tRNATrp with the UGA codon, the corresponding anticodons U*UA and U*CA were respectively identified.

The initiation codon was identified mainly through the efforts of two groups. In 1966, Norton Zinder and colleagues found that the N-terminus of proteins synthesized by an in vitro translation system using a phage RNA as a messenger is always formyl-Met. In the same year, Brian Clark and Kjeld Marcker identified two species of tRNAMet in E. coli cells, one of which bound to formyl-Met while the other bound only to Met. From these findings, it was deduced that the initiation codon used in the translation was AUG and that it coded for Met.

The initiation site in mRNA was discovered by Joan Steitz in 1969 using RNA of R17 phage. She formed an initiation complex in which 32P-labelled R17 phage RNA was bound to the ribosome under the initiation conditions for protein synthesis (consisting of fMet-tRNA, guanosine triphosphate (GTP) and initiation factors). The 32P-labelled RNA in the complex was digested with pancreatic ribonuclease and the nucleotide sequences of the 32P-labelled RNA fragments, which were protected from RNAase digestion by the ribosome, were determined. As shown in Figure 4, the protected RNA fragments contained all the N-terminal amino acid sequences in three phage proteins. The result unambiguously demonstrated that the initiation codon was AUG. A similar finding was obtained using Qβ phage RNA.

thumbnail image

Figure 4. Nucleotide sequences of the translation initiation sites in the coat protein, replicase and A protein of R17 RNA. The underlined sequences are the Shine–Dalgarno sequences. Redrawn from Steitz (1969) Nature 224: 957–964.

In Vivo Code

  1. Top of page
  2. Historical Background to Breaking the Code
  3. Topology of the Code as Revealed by Frameshift Mutations
  4. The Code Decoded Using Protein Synthesis In Vitro
  5. Termination and Initiation Codons
  6. In Vivo Code
  7. The Universal Genetic Code
  8. Progress after the Cold Spring Harbor Symposium
  9. Further Reading

As described above, all the elements of the genetic code were deciphered using in vitro protein synthesis systems. Demonstrating a functioning code in vitro, however, does not guarantee that it functions in vivo. In 1966, George Streisinger's group conducted an experiment to confirm the in vivo validity of the deduced genetic code by analysing the T4 phage lysozyme gene and the amino acid sequence of the corresponding protein. Because the lysozyme is a single protein unit and is relatively easy to purify, it was considered a good model to investigate whether changes in a gene are reflected in the expressed protein. Streisinger's group prepared the double mutant eJ42eJ44 by recombining the genes of two frameshift mutants, eJ42 and eJ44, and compared their amino acid sequence with that of the wild-type lysozyme. In the mutant amino acid sequence, five amino acid residues of eJ42 and eJ44 were changed from those of the wild-type lysozyme (Figure 5), indicating that two mutations, a base deletion and a complementary insertion, had occurred very close to each other and that between the two mutations the reading frame had shifted. When the experimental results were evaluated according to the genetic code table, the mutation combination ‘deletion of A followed by addition of G’ (see Figure 5) alone could provide an unambiguous explanation. Heinz Fraenkel-Conrat's and Heinz-Günter Wittmann's groups obtained similar results in their experiments using tobacco mosaic virus RNA. See also Lysozyme

thumbnail image

Figure 5. Comparison between amino acid sequences of lysozyme of wild-type T4 phage and eJ42eJ44 frameshift mutant. Determination of in vivo code. Redrawn from Terzaghi E, Okada Y, Streisinger G et al. (1996) Proceedings of the National Academy of Sciences of the USA 56: 500–507.

The following intrinsic characteristics of the gene and its expression were revealed by the work discussed in the foregoing sections:

  1. The genetic code elements deciphered by in vitro experiments are also used in vivo.

  2. Genetic information is read from a defined point in the nucleotide sequence with a reading frame consisting of three-base combinations.

  3. The direction of mRNA translation is from the 5′ to the 3′ end (deduced from another experimental study not discussed here).

The Universal Genetic Code

  1. Top of page
  2. Historical Background to Breaking the Code
  3. Topology of the Code as Revealed by Frameshift Mutations
  4. The Code Decoded Using Protein Synthesis In Vitro
  5. Termination and Initiation Codons
  6. In Vivo Code
  7. The Universal Genetic Code
  8. Progress after the Cold Spring Harbor Symposium
  9. Further Reading

In 1966, a symposium was held at Cold Spring Harbor in the USA to discuss the coding problem. Participants included the leading molecular biologists of the day and other relevant researchers. By combining all the experimental results thus far obtained, a genetic code table was constructed: all the 64 codons were unambiguously assigned to each of the 20 amino acids after discussion and evaluation of research findings. Crick also contributed in the arrangement of the genetic code table that is used today (Figure 6). As the genetic code had been found to be common to all the organisms examined – bacteria, yeasts, viruses, plants and animals – it was named the ‘universal genetic code’. The concept of this universal code led to the hypothesis that all living organisms on earth derive from a common origin. See also Cold Spring Harbor Laboratory

thumbnail image

Figure 6. The universal genetic code. At the time of the Cold Spring Harbor Symposium (1966) the UGA opal codon was not identified and all the codons were not completely allocated. The initiation codon was uncertain.

Progress after the Cold Spring Harbor Symposium

  1. Top of page
  2. Historical Background to Breaking the Code
  3. Topology of the Code as Revealed by Frameshift Mutations
  4. The Code Decoded Using Protein Synthesis In Vitro
  5. Termination and Initiation Codons
  6. In Vivo Code
  7. The Universal Genetic Code
  8. Progress after the Cold Spring Harbor Symposium
  9. Further Reading

Active research continued after the Symposium. Using R17 RNA, Steitz confirmed the code by comparing the nucleotide sequence of native RNA and the amino acid sequence synthesized from the RNA (1969). Walter Fiers’ group determined the entire nucleotide sequence of MS2 phage RNA (3569 nucleotides) and confirmed the codon assignments in the genetic code table in the process (1976). In another direction, developments in genetic engineering allowed DNA to be sequenced. Sanger's group was the first to determine a complete DNA nucleotide sequence – that of ΦX phage (5386 nucleotides; 1978). The DNA nucleotide sequences of the phage fd (6408 nucleotides; 1978) and the mammalian virus SV40 (5243 base pairs; 1978) were subsequently determined. By directly comparing the nucleotide sequence of RNA, or DNA, with the amino acid sequence of the corresponding protein, the accuracy of the universal genetic code table in vivo was confirmed. See also DNA Virus Genomes, and RNA Virus Genomes

Nowadays, DNAs from diverse sources – from viruses and microorganisms to plants and animals – can be analysed easily to determine their nucleotide sequences. The amino acid sequences of proteins encoded by genes can also be readily determined. By comparing the homologies of amino acid sequences of known proteins with those of unknown ones, the structures and functions of unknown proteins can be deduced. Projects to determine the genomes of various organisms, including the human genome, have met with success: the complete DNA sequences of about 100 species have already been determined and the next task is to identify each gene by elucidating its biological function. In pursuing these goals, attention must be paid to nonuniversal genetic codes; some organisms have been found to use unconventional codes, and others as yet unknown may also do so. See also Genome Mapping, Viruses: Genomes and Genomics, The Promise of Whole Genome Sequencing, Genome, Proteome and the Quest for a Full Structure–Function Description of an Organism, and Genome Sequence Analysis

Glossary
Comma-less code

A hypothetical code that was proposed by Crick, in which each amino acid is specifically assigned to a corresponding single codon and where the codon triplets do not overlap. The comma-less code consists of 20 ‘sense codons’ corresponding to 20 amino acids and 44 ‘nonsense codons’ which are considered not meaningful (for details see text).

Frameshift mutation

A mutation caused by a base addition or deletion into the coding sequence resulting in a shift in the coding frame. The amino acid sequence of the resulting protein is altered downstream from the mutation point.

Nonsense mutant

A mutant in which a nonsense mutation exchanges a sense codon for a termination codon. This results in a premature protein that is often inactive and can be lethal for the nonsense mutant bacteria or virus produced.

Phage

A type of virus that infects only bacteria, and thus is called a bacteriophage. There are different kinds of phage, those that possess single-stranded DNA (such as f1 and fd phages) or double-stranded DNA (T2 and T4 phages) and others that possess single-stranded RNA (R17 and f2 phages).

RNA copolymer

RNA comprising of more than two nucleotides. They are the antonym of homopolymers, which comprise of only a single nucleotide. RNA copolymers in which the nucleotides are arranged alternately are called alternate copolymers and those containing randomly arranged nucleotides are called random copolymers.

Tandem repeats

A repeated sequence composed of defined components. For example, tandem repeats of AUG give the sequence AUGAUGAUGAUG.

Further Reading

  1. Top of page
  2. Historical Background to Breaking the Code
  3. Topology of the Code as Revealed by Frameshift Mutations
  4. The Code Decoded Using Protein Synthesis In Vitro
  5. Termination and Initiation Codons
  6. In Vivo Code
  7. The Universal Genetic Code
  8. Progress after the Cold Spring Harbor Symposium
  9. Further Reading