Proteins, the key functional units of cells, are synthesized during translation, where the RNA template is read in triplets called codons, each corresponding to a specific amino acid. There genetic code contains 61 codons encoding 20 amino acids, and as a result, most amino acids can be encoded by two or more 'synonymous' codons. The choice of one synonymous codon over another was long thought to carry no importance, but emerging evidence suggests that codon usage can significantly impact protein syntesis and folding. Some codons are translated faster than others, and this difference in speed alters how an amino acid sequence is folded into a protein. Moreover, slow translation can trigger RNA degradation and thereby halt protein production. Despite their importance for cellular function, the mechanisms through which codon-level information act remain largely obscure.
We use a variety of data-driven approaches to unravel the complex mechanisms underlying codon usage effects. We perform large-scale analysis of publicly available translational profiling data (Ribo-seq) to improve existing analysis methods and to identify novel regulatory mechanisms determining ribosome decoding dynamics. We also employ machine learning to develop in silico models that can simulate the effect of codon usage and be used to identify key factors that influence codon-level effects on gene expression. In addition, we investigate the role of codon usage in Drosophila, a model organism that offers unique opportunities to study the evolution of codon usage and host-parasite interactions.
Current collaborators include Ben Nicholson for the experimental side of this project, including the use of the Drosophila model system.