Evolutionary Forces at Play for Non-Hodgkin’s Lymphoma
A research blog by Colin Speer
Did you know that certain cancer-associated genetic mutations might have once provided us with an evolutionary advantage? As an intern this summer in Lab LaBella at UNC Charlotte, I sought to identify the evolutionary forces that have acted on regions of the human genome associated with various types of cancer. Research in the field of evolutionary medicine continues to grow and is an essential aspect of furthering our understanding of diseases including cancer. This blog post delves into the results of my summer project and what I’ve learned.
The first steps of my research project were to find a Genome-wide Association Study (GWAS) of interest and to run that data through our GSEL pipeline. GWAS identifies regions of the genome associated with specific traits. I wanted to work with a dataset that identified genomic regions associated with cancer conducted in individuals of European ancestry (the GSEL pipeline has precomputed evolutionary metrics for European ancestry.) The GSEL pipeline detects the enrichment and depletion of evolutionary signatures from GWAS summary statistics. The dataset I chose identified regions of the genome associated with Non-Hodgkin's Lymphoma and can be found here.
The central outputs of our pipeline are z-scores & p-values. Z-scores tell us how many standard deviations our data is from the mean in a distribution. P-values tell us about the statistical significance of our findings. The pipeline also produces several plots. I created a z-score and p-value heatmap plot in R Studio with the output files. The two graphs below identify regions within the human genome with a significant evolutionary marker. The radial graph generated by the GSEL pipeline uses p-values to show significance. The most significant evolutionary signature across the complete dataset is LINSIGHT. The heatmap that I created shows the z-score for each individual region. The most enriched evolutionary signature shown by the bright red squares appears to be LINSIGHT, agreeing with the radial plot.
From these plots, I chose to pursue a genomic region associated with cancer with an elevated value of the LINSIGHT statistical model, which estimates negative-selection on noncoding sequences in the human genome. Now that we have our region of interest, it’s time to delve further into its importance within our genome.
Utilizing the RS ID (conserved identifier) of the SNP location enabled me to explore numerous intriguing databases, gaining insight into its underlying significance. As you can see from the resource Ensembl and dbSNP, our SNP location is part of the 3-prime untranslated region (UTR) of the TP53 gene (tumor protein p53), with the ancestral allele being T and the reference allele being G. Variation in the UTR of a gene does not change the protein sequence but it may alter regulation.
The TP53 gene is a type of tumor suppressor gene. Mutations in this gene may cause cancer cells to grow and spread throughout the body. Research has shown that TP53 isn’t linked to any specific type of cancer and that it increases cancer risk across the spectrum.
Another engaging resource I’ve enjoyed using is RegulomeDB. By inputting an RS ID, you can get a multitude of different information. As you can see from the image below, the SNP is involved in strong transcription in many parts of the human body. This confirms what we know about TP53 and its importance within our genome.
From the eQTL (expression quantitative trait locus) violin plots from GTEx Portal below we can see that being a carrier for the G allele lowers expression for the TP53 gene in every tissue shown. No data for individuals homozygous G appears to exist, perhaps indicating that it’s fatal.
Pleiotropy is when a single gene influences many different traits or characteristics within an organism. A gene with a pleiotropic effect can contribute to many parts of an individual’s phenotype. When searching for our SNP on the GWAS Catalog 42 traits with 79 associations were identified. This suggests that the region has a pleiotropic effect.
52 of the associations are linked with the G allele. Many of the identified traits are an increased risk of variations of cancer, among other diseases. However an interesting point is that there appears to be a decrease in diastolic blood pressure in individuals with the G allele. 14 of the associations are linked with the T allele. The identified traits are an increase in pulse pressure and diastolic blood pressure. An interesting association with the T allele is that there appears to be an increased risk for triple-negative breast cancer. This type of breast cancer does not contain three receptors commonly found in breast cancer cells.
The Phenome-wide association study (PheWAS) plot below shows that the effect allele G is associated with many characteristics.
From my findings above, I have developed a hypothesis regarding the compelling reasons behind the pronounced negative-selection observed in the region. For starters, the area is within the untranslated region of the TP53 gene. UTRs are a critical part of the post-transcriptional regulation of gene expression, making this region particularly important to our gene. The difference between an individual having the ancestral allele and the effect allele appears to be significantly associated with an increased cancer risk. This leads us to a significant question: “Why has the G allele not undergone more negative selection within the human population?"
My first hypothesis is that the effect allele G has not existed long enough for negative-selection. Comparative genomics information on the region shows that all available mammal species share the ancestral allele T as do the available Neanderthal populations. From this, we can conclude that the G allele must be relatively new to our genomes as it is not shared across multiple species. This effect allele may not be old enough for the negative selection process.
My alternative hypothesis is that the effect allele G may be advantageous in other situations—this is referred to as trade-offs and may explain the presence of the allele. From the information in the GWAS Catalog, we know the odds ratio for various cancers and diseases. As previously stated, the G allele is associated with an increased risk for many types of cancer, and the T allele is associated with an increased risk for higher pulse pressure and diastolic blood pressure. Could it be that in specific populations, this trade-off is a viable option? In a region characterized by a diet and lifestyle that poses a high risk for heart disease, this trade-off could potentially hold evolutionary significance.
Working on my Summer project in Lab LaBella has enriched my understanding of bioinformatics, evolutionary biology, and computational research. Applying the skills I have learned in Charlotte’s Master's program to real-world datasets has allowed me to explore facets of bioinformatics I had not considered before. Moving forward, I am ecstatic to continue furthering my knowledge of cancer biology and conducting research in this field of bioinformatics.
About the Author
Colin is a second year Bioinformatics masters student at the University of North Carolina at Charlotte, holding a B.S. in computer science from the same institution. His research interests are utilizing machine learning and statistics to address challenges in
cancer research. Beyond academics, Colin excels as a competitive Esports player, proudly representing the university in collegiate competitions. He also indulges in his passion for sports by avidly following hockey and Formula One.