Cracking the genetic code

Making sense of the language of DNA to help transform 21st century healthcare and beyond

In 1961, John Pfeiffer a journalist from the New York Times stated that the biggest news story of the year “was not the orbiting of the Russian astronauts, it was the cracking of a biological code, which governs all the processes of life”1. Pfeiffer was describing the genetic code: a fundamental ‘dialect’ shared by nearly all lifeforms on Earth.

The discovery of the structure of DNA in 1953 was heralded as ground breaking, though as significant as this was, the process of how the information contained within DNA was translated to proteins was still unknown. How was it that the four bases of DNA could be translated into 20 amino acids, forming the building blocks of proteins?

The RNA tie group

In 1954, to stimulate progress, a Russian physicist George Gamow decided a collaborative effort would be the best course of action to solve this mystery. Drawing on expertise from different fields across the world, he formed ‘The RNA Tie Club’ with the intent “to solve the riddle of the RNA structure and to understand how it built proteins”. The club featured prominent scientists, including James Watson and Francis Crick, and boasted 20 members in total, each one bearing a woollen helix embroidered necktie and representing one of the amino acids.

Several findings came from the group, including one from Gamow himself. By using mathematics, he demonstrated that a 3-letter nucleic acid code was the minimum number needed to cover all 20 amino acids:

  • If the code was only one letter long, only four amino acids could be coded for – one for each of the four DNA/RNA nucleotide bases – A, U (T in RNA), C and G.
  • If the code was only two letters (4×4) this would only cover 16 of the amino acids, but three letters (4x4x4) could produce 64 combinations, safely covering the 20 amino acids.

However, the first to decrypt the code was not a member of the Tie Club. That accolade fell to an American biochemist, Marshall Nirenberg. In 1961, along with his colleague Johann H Matthaei, Nirenberg showed that a triplet of uracils (U) coded for the amino acid phenylalanine (F). At last, the genetic code had been cracked. Nirenberg revealed how the information to build proteins was translated from the genetic material. Over the next five years, Nirenberg and his team had determined all 64 codons and the corresponding amino acids. His discoveries led to the Noble Prize in Physiology or Medicine, which he shared with fellow scientists involved in his work.

The code itself

The code is written using the four nucleic acids found in RNA: adenine (A), uracil (U), cytosine (C) and guanine (G). Assembling the four bases into triplets (codons) produces 64 combinations: 61 coding for amino acids and the remaining 3 coding for stop signals instructing the translation machinery to stop making a polypeptide chain.

Though there are more codon combinations than amino acids, this is not a biological slip-up, as having more codons than amino acids allows for a degree of error. For instance, the codon UCA codes for the amino acid serine. If the A in the third position were to be changed to a C (UCC), the codon would still code for serine. This property, known as redundancy, reduces the chance of mistakes in the DNA being translated into proteins, which could be detrimental to the organism.

Solving the riddle of the genetic code has paved the way for tremendous advances in molecular biology. Now DNA and RNA can be pieced together in the laboratory to produce recombinant protein, used in the study of cellular processes, the mechanisms of disease or the development of new therapeutic treatments.

  1. National Institutes of Health, Department of Health & Human Services