top of page

Decoding fungal metabolism with machine learning by Isabel Montoya

  • Writer: Abigail LaBella
    Abigail LaBella
  • Sep 9
  • 3 min read

     During the Spring 2025 semester, I had an incredible opportunity to work with Dr. LaBella on a research project exploring codon usage bias in two metabolic pathways, the pentose and autophagy pathways, across four yeast species: Saccharomyces cerevisiae, Saccharomyces uvarum, Candida albicans, and Candida dubliniensis. Codon usage bias is an interesting phenomenon in which an organism has a preference for certain codons (three-letter DNA “codes” that serve as translating bridges between DNA and amino acid molecules), especially among synonymous codons, which are codons that code for the same amino acid. Having codon usage preference can enhance translational efficiency for the organism and may vary by protein, metabolic pathway, or species. The pentose pathway produces molecules essential for biosynthesis, while the autophagy pathway modulates recycling of intracellular material. Both pathways are important and conserved in many organisms, from yeasts to humans. To investigate codon usage bias, we used the Relative Synonymous Codon Usage (RSCU) metric along with KEGG IDs. RSCU measures how frequently a codon is used by dividing the observed codon usage by the expected number that it would be used randomly. If the RSCU value of a codon is greater than 1, it means that the codon is used more than expected. KEGG IDs are identifiers used in the KEGG database to specifically categorize genes, pathways and proteins based on their genetic sequence and function. With Dr.LaBella’s model that uses the machine learning algorithm, Random Forest, I analyzed patterns in codon usage bias in the four yeast species and their pentose and autophagy metabolic pathways. In this research project, we hypothesized two things: (1) the model will be able to distinguish between pentose and autophagy metabolism genes based on the codon usage of a yeast species, and (2) the yeasts S. cerevisiae with S. uvarum and the yeasts C. albicans with C. dubliniensis will have the same important codons because of their close evolutionary relationships. 


     By providing the model with RSCU values and KEGG gene IDs from each yeast species, along with KEGG pathway templates (which list the genes involved in each metabolic pathway), the Random Forest algorithm was trained on randomly selected subsets of genes to learn patterns and predict if a gene is part of a pathway or not. We tested two strategies: one where the model predicted whether a gene belonged to the pentose pathway or not, and the other where the model predicted whether a gene belonged to the pentose or autophagy pathway. Comparing the model results across the four yeast species, both S. cerevisiae and S. uvarum had lower prediction error rates than Candida albicans and Candida dubliniensis in the first strategy, which could be due to evolutionary differences as Candida species are pathogenic and the Saccharomyces are not. Analyzing the results between the two strategies, I found that the model yielded better accuracy with the second strategy, suggesting that the model best predicts when distinguishing between two pathways instead of one. Additionally, I found that the most important codons from each yeast species were more frequently used in the pentose pathway than in the autophagy pathway, which may indicate stronger codon usage bias in the pentose pathway. In both Candida species, the model found “GGT” to be the most important codon, possibly reflecting a conserved role in the Candida genus. 


     Throughout this project, I developed skills in R programming, including data analysis and visualization with ggplot2, and gained firsthand experience working in a collaborative lab environment. I also improved my ability to read, interpret, and communicate scientific research, both through presenting my own findings and learning from labmates. This research project aligned closely with my interests in bioinformatics and molecular biology, allowing me to apply and deepen my knowledge from my bioinformatics minor. I’m grateful for Dr. LaBella’s guidance and expertise throughout the project - her mentorship was instrumental to my technical and professional growth. Through working with her, I also saw how thoughtful, inclusive leadership can foster a supportive and collaborative research environment.


Written by Isabel Montoya



 
 
 

Comments


bottom of page