This article is the first in our series of Biology Basics, which will explain how our cells work. Illustration and article by Abid Javed.
DNA is the genetic code for all the components that a cell requires to actively function. This includes the proteins that essentially direct how a cell and hence a tissue (as a collective mass of cells) function in our body.
The structure of DNA was first discovered by James D. Watson & Francis Crick (in collaboration with Dr. Rosalind Franklin) at the University of Cambridge, using X-Ray Crystallography to elucidate the double helix architecture of this molecule (1). DNA molecules are composed of four nucleic acids that are abbreviated to one letter: A (Adenine), T (Thymine), G (Guanine) and C (Cytosine). Due to their chemical composition, these bases specifically form hydrogen bond interactions with each other that essentially bring the double helix architecture of a DNA molecule in place. Within the DNA structure, A pairs only with T and G pairs with C. This binding specificity is due to the number hydrogen interactions each base can make (A-T pairs have two bonds and G-C pairs have three). It reflects their relative stability within the DNA structure— the A-T pair is less stable than G-T pair— and the collective strength of these hydrogen bonds keep the double helix structure stable. Although it is relatively prone to certain forms of damage, a DNA molecule can stay stable for hundreds of years, as evident by the recent discovery and sequencing DNA from a piece of woolly mammoth hair (2).
The nucleus of every cell in your body contains a full copy of your entire DNA code which comprises 3 billion base pairs and would measure approximately 3 metres if laid out. Due to the length of a single DNA strand, how can a DNA molecule fit within the tiny nucleus compartment? Well, this is made possible by the DNA molecules binding to DNA-packing proteins called histones that wrap the DNA strands like a thread on a spool. These histone-bound DNA strands are packaged as chromosomes in the nucleus. During transcription and synthesis of messenger RNA (mRNA) from DNA, the histone-bound DNA is unwrapped and the double helix strands separate to allow the RNA polymerase and transcription machinery to bind to the target gene and synthesise RNA.
A gene is sequence in the DNA that is essentially the code for a particular protein to be synthesized via mRNA. Each gene comprises regions that encode for the protein itself, called exons, interspersed with non-protein encoding regions called introns, that are believed to have a regulatory role during gene transcription. Introns are ‘spliced out’ by the spliceosome machinery when DNA is translated into mRNA. Different combinations of the gene exons can be used to form mRNA. These are called splice forms, and allow proteins with different functions to be encoded by the same genetic sequence.
DNA is divided into protein coding genes and non protein coding regions, which are becoming increasingly recognised as important. Only 2% of DNA is made into protein products, therefore the rest has previously been thought to be‘junk’ DNA. Data from the recent ENCODE project has shown that in fact 80% of our genome is metabolically active, and that a lot of this ‘junk’ DNA actually forms part of a key regulatory component in maintaining the cellular and physiological functions of the protein coding DNA (3).
DNA molecules have specific structural features that allow interaction with proteins. In particular, proteins, such as transcription factors (proteins that activate a gene), can bind DNA where the chemical groups on the nucleotide bases are exposed. A liver cell is different from a skin cell because different genes are activated in each cell type.