Synonymous codon usage and its bias in the bacterial proteomes primarily offset GC content variation to maintain optimal amino acid compositions
DOI:
https://doi.org/10.51094/jxiv.99Keywords:
synonymous codon, codon usage bias, amino acid composition, GC contentAbstract
Codon usage bias is the preferential or non-random synonymous codon usage among species. A recent review concluded that their biases are a complex phenomenon influenced by numerous factors, including genome composition, GC content, expression level, length of genes, and recombination rates. In this paper, I present a new plot chart and show a more straightforward explanation of the primary function of the synonymous codon usage and its bias.
First, I calculated each protein's amino acid compositions and its gene's nucleotide compositions from the publicly available proteome cording sequence data set of 23 different bacteria. Next, I calculated the maximum and minimum GC contents of the possible gene variations of the amino acid composition of each protein. And then, they were plotted together by their actual GC content on a scatter plot.
The plot showed a clear tendency. Proteins with lower actual GC content genes are coded for by genes closer to the minimum possible GC content. On the other hand, proteins with higher actual GC content genes are coded for by genes closer to the maximum possible GC content. This tendency indicates that synonymous codon usage bias is uniformly working toward offsetting the variation in GC content. Meanwhile, all plots of maximum and minimum values were aligned in a row within a narrow band for each. Therefore, I considered that the optimal range of amino acid composition of the proteome is relatively limited and that organisms use this GC offset function to meet the range conditions.
Synonymous codons are part of the genetic code table. Therefore, if synonymous codons and their usage bias have a GC offset function to maintain the optimal amino acid composition, it must be considered a fundamental function of the genetic code table assignment.
Downloads *Displays the aggregated results up to the previous day.
References
Parvathy, S. T., Udayasuriyan, V., & Bhadana, V. (2022). Codon usage bias. Molecular Biology Reports, 49(1), 539–565. https://doi.org/10.1007/s11033-021-06749-4
Genome dataset, "Fusobacterium nucleatum subsp. nucleatum", NCBI website, https://www.ncbi.nlm.nih.gov/data-hub/genome/GCF_003019295.1/
Genome dataset, "Mycoplasma genitalium G37", NCBI website, https://www.ncbi.nlm.nih.gov/data-hub/genome/GCF_000027325.1/
Genome dataset, "Dictyoglomus turgidum DSM 6724", NCBI website, https://www.ncbi.nlm.nih.gov/data-hub/genome/GCF_000021645.1/
Genome dataset, "Thermodesulfovibrio yellowstonii DSM 11347", NCBI website, https://www.ncbi.nlm.nih.gov/data-hub/genome/GCF_000020985.1/
Genome dataset, "Leptospira interrogans", NCBI website, https://www.ncbi.nlm.nih.gov/data-hub/genome/GCF_001569005.1/
Genome dataset, "Helicobacter pylori 26695", NCBI website, https://www.ncbi.nlm.nih.gov/data-hub/genome/GCF_000307795.1/
Genome dataset, "Chlamydia trachomatis D/UW-3/CX", NCBI website, https://www.ncbi.nlm.nih.gov/data-hub/genome/GCF_000008725.1/
Genome dataset, "Bacteroides thetaiotaomicron", NCBI website, https://www.ncbi.nlm.nih.gov/data-hub/genome/GCF_014131755.1/
Genome dataset, "Aquifex aeolicus VF5", NCBI website, https://www.ncbi.nlm.nih.gov/data-hub/genome/GCF_000008625.1/
Genome dataset, "Bacillus subtilis subsp. subtilis str. 168", NCBI website, https://www.ncbi.nlm.nih.gov/data-hub/genome/GCF_000009045.1/
Genome dataset, "Thermotoga maritima MSB8", NCBI website, https://www.ncbi.nlm.nih.gov/data-hub/genome/GCF_000230655.2/
Genome dataset, "Synechocystis sp. PCC 6803", NCBI website, https://www.ncbi.nlm.nih.gov/data-hub/genome/GCF_000009725.1/
Genome dataset, "Escherichia coli str. K-12 substr. MG1655", NCBI website, https://www.ncbi.nlm.nih.gov/data-hub/genome/GCF_000005845.2/
Genome dataset, "Neisseria meningitidis MC58", NCBI website, https://www.ncbi.nlm.nih.gov/data-hub/genome/GCF_000008805.1/
Genome dataset, "Chloroflexus aurantiacus J-10-fl", NCBI website, https://www.ncbi.nlm.nih.gov/data-hub/genome/GCF_000018865.1/
Genome dataset, "Rhodopirellula baltica SH 1", NCBI website, https://www.ncbi.nlm.nih.gov/data-hub/genome/GCF_000196115.1/
Genome dataset, "Geobacter sulfurreducens PCA", NCBI website, https://www.ncbi.nlm.nih.gov/data-hub/genome/GCF_000007985.2/
Genome dataset, "Gloeobacter violaceus PCC 7421", NCBI website, https://www.ncbi.nlm.nih.gov/data-hub/genome/GCF_000011385.1/
Genome dataset, "Bradyrhizobium diazoefficiens USDA 110", NCBI website, https://www.ncbi.nlm.nih.gov/data-hub/genome/GCF_000011365.1/
Genome dataset, "Mycobacterium tuberculosis H37Rv", NCBI website, https://www.ncbi.nlm.nih.gov/data-hub/genome/GCF_000195955.2/
Genome dataset, "Pseudomonas aeruginosa PAO1", NCBI website, https://www.ncbi.nlm.nih.gov/data-hub/genome/GCF_000006765.1/
Genome dataset, "Deinococcus radiodurans ATCC 13939", NCBI website, https://www.ncbi.nlm.nih.gov/data-hub/genome/GCF_020546685.1/
Genome dataset, "Streptomyces coelicolor A3(2)", NCBI website, https://www.ncbi.nlm.nih.gov/data-hub/genome/GCF_008931305.1/
"Reference proteomes - Primary proteome sets for the Quest For Orthologs”, EMBL-EBI website. https://www.ebi.ac.uk/reference_proteomes/
Downloads
Posted
Submitted: 2022-06-26 21:06:34 UTC
Published: 2022-06-28 09:29:54 UTC
Versions
- 2022-07-04 04:13:56 UTC (2)
- 2022-06-28 09:29:54 UTC (1)
Reason(s) for revision
License
Copyright (c) 2022
Genshiro Esumi
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.