How can only 4 letters (the DNA bases A T C and G) encode all the information required to create a person, a fungus or a bacteria? The answer to this question is, in part, through process by which DNA is transcribed to RNA and then translated into proteins. The four DNA bases are arranged into three letter words (codons) that are read (translated) into the 21 amino acids that make up proteins..
There are 64 different ways that the four DNA bases and be arranged into three base pair codons. Because there are only 21 amino acids some of the 64 codons are translated into the same amino acid. These are called synonymous codons and they typically differ only in the last base pair.
For a long time it was assumed that changes in synonymous codon usage were silent - that they couldn't impact expression or be acted upon by selection. If you end up with the same amino acid, it shouldn't matter which synonymous codon is used?
BUT! as more genes were sequenced it was revealed that some synonymous codons were used high frequencies while others were avoided. This was called Codon Usage Bias
Why are some codons used more frequently than others? One of the earliest observations was that transcripts that contained frequently used codons were typically expressed at a higher level. This is partly due to the availability of different tRNAs (the decoder of codons into amino acids). By matching codon usage to available tRNAs translation of RNA to Protein occurs more efficiently - a process called Codon Optimization.
Since this initial discovery Codon Usage Bias has been implicated in an extraordinary number of cellular and biological processes including the following: protein folding, stress response, transcription, mRNA stability, and the cell cycle. Many of these roles are summarized in the review Speeding with control: codon usage, tRNAs, and ribosomes by Eva Maria Novoa & Lluís Ribas de Pouplana (paper link). You can see a summary of these roles in the image to the right.
Given the numerous and emerging roles of codon usage in many different processes, there is important information about function and adaptation encoded within codon usage patterns. Our work is focused on extracting that information using bioinformatics to crack this code within the code. We are working on these specific hypotheses within the lab:
We hypothesize that adaptation that occurs through changes in expression can be detected using differences in codon optimization.
We hypothesize that changes in optimal codon identity (via changes in the tRNA pool) mediate sweeping expression changes in response to external stimuli.
We hypothesize that changes in optimal codon identity between species is (in part) due to shifts in ecological niche
Why can't two parents without face freckles have a child with face freckles? Why can two parents with free earlobes have a child with attached earlobes? It's because these are Mendelian traits where one trait is determined by a single gene that has a dominant and recessive allele. You may even remember (fondly or not) completing Punnett Square in biology class to determine the likelihood of each of these scenarios.
While some traits follow Mendelian inheritance, most human traits are complex traits. In complex traits, there are many different genes that have a small effect on the trait. For example, in height, there are over 10,000 bases in the human genome that can affect how tall you are. This is why sometimes two shorter parents can have a tall child!
Most human diseases are complex traits where hundreds of regions in the genome contribute to the likelihood you develop the disease. Many groups around the world are working to identify which regions of the genome contribute to complex traits using methods such as Genome-Wide Association Studies. Identifying the genes that contribute to these diseases can help us understand why they occur in the first place.
Punnet Square @Wikipedia
Punnet Square with independent loci. Now imagine if it were 10,000!
In our group, we turn to the power of EVOLUTION to understand how and why complex traits occur. This is based on the principle that all traits (including diseases) are the product of previous evolutionary innovations. For example, the evolution of multicellularity necessarily means that errors in multicellularity can occur such as the uncontrolled duplication of cells which occurs in cancers.
As another example, the evolutionary innovation of live birth necessarily means that errors in birth timing can occur such as preterm birth.
A timeline of evolutionary events (top) in the deep evolutionary past and on the human lineage that are relevant to patterns of human disease risk (bottom). The ancient innovations on this timeline (left) formed biological systems that are essential, but are also foundations for disease.
Because evolution is a major factor shaping the complex traits we see today, we use computational evolutionary & genomics methods to examine the evolutionary history of genomic regions associated with complex traits. These analyses can reveal important information about how these regions contribute to traits. In our group, we are interested in the following specific hypotheses
We hypothesize that regions that contribute to complex traits are subject to detectable and quantifiable evolutionary forces.
We hypothesize that measuring the imprint of evolutionary forces on "silent" synonymous sites in the genome will reveal how these regions contribute to traits.