The standard genetic code is designed to generate transmembrane domains and intrinsically disordered regions as projections of the thymine density on the gene
DOI:
https://doi.org/10.51094/jxiv.533Keywords:
standard genetic code, thymine composition, transmembrane domains, intrinsically disordered regions, optimized translationAbstract
We know that the codon-amino acid correspondence in the genetic code is not random. However, there was no established theory as to whether this correspondence was designed for any purpose or function. In a previous report, I showed that the proteins with high amounts of transmembrane domains and the proteins with high amounts of intrinsically disordered regions correspond to the high and low TA (thymine adenine) skew of their gene, respectively, and I speculated that these reflect the purpose behind the design of the genetic code. However, since most protein genes use their synonymous codon selection to balance their GC (guanine cytosine) content, i.e., their TA content, I hypothesized that the amount of only one of these two nucleic acids, thymine or adenine, might actually originate the characteristics of the amino acid composition of these two functional domains/regions.
In this study, I examined the correspondence between these two functional domains/regions and the estimated composition of each nucleic acid of various protein genes from different organism proteomes by back-calculating the possible nucleic acid compositions of the gene from the amino acid residue composition of the protein.
The results showed that the proteins with high amounts of transmembrane domains and the proteins with high amounts of intrinsically disordered regions were indeed correlated with the higher and lower estimated thymine composition on the genes, respectively. Upon detailed analysis, the transmembrane domains correlated more strongly with the maximum estimated thymine composition and the intrinsically disordered regions correlated more strongly with the minimum estimated thymine composition.
Since the amino acid compositions of membrane proteins with higher thymine composition genes correspond to the maximum estimated thymine compositions, and the amino acid compositions of intrinsically disordered proteins with lower thymine composition genes correspond to the minimum estimated thymine compositions, it is more reasonable to assume that the characteristic amino acid compositions of the two domains/regions are both formed by the thymine densities of the genes, rather than these thymine density structures being formed by selective pressure on amino acid compositions. Thus, the functions of these two functional domains/regions are thought to arise as projections of the thymine densities of their properly preformed gene sequences.
The results shown in this study suggest that the standard genetic code has an optimized structure that allows for optimized translation and synthesis of the functional domains of proteins. I conclude that the current genetic code must have been selected for this functional advantage, and I propose this as the "optimized translation" theory that explains the origin of the genetic code.
Conflicts of Interest Disclosure
The author declare no conflicts of interest associated with this manuscript.Downloads *Displays the aggregated results up to the previous day.
References
Esumi, G. (2023). The TA Skew of a Gene Primarily Determines the Type of Protein, Such as Membrane Protein or Intrinsically Disordered Protein. Jxiv. https://doi.org/10.51094/jxiv.446
Esumi, G. (2022). Synonymous codon usage and its bias in the bacterial proteomes primarily offset guanine and cytosine content variation to maintain optimal amino acid compositions. Jxiv. https://doi.org/10.51094/jxiv.99
"Quest for Orthologs" group. (2023) Reference proteomes - Primary proteome sets for the Quest For Orthologs, RELEASE 2023_03. https://www.ebi.ac.uk/reference_proteomes/ Accessed 1 Sep 2023
The UniProt Consortium. (2023) UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Res. 51:D523–D531. https://doi.org/10.1093/nar/gkac1052
Esumi, G. (2023). The Distributions of Amino Acid Compositions of Proteins in an Organism’s Proteome Uniformly Approximate Binomial Distributions. Jxiv. https://doi.org/10.51094/jxiv.408
Vakirlis, N., Acar, O., Hsu, B., Castilho Coelho, N., Van Oss, S. B., Wacholder, A., Medetgul-Ernar, K., Bowman, R. W., II, Hines, C. P., Iannotta, J., Parikh, S. B., McLysaght, A., Camacho, C. J., O’Donnell, A. F., Ideker, T., & Carvunis, A.-R. (2020). De novo emergence of adaptive membrane proteins from thymine-rich genomic sequences. In Nature Communications (Vol. 11, Issue 1). Springer Science and Business Media LLC. https://doi.org/10.1038/s41467-020-14500-z
Efimov, V. M., Efimov, K. V., Kovaleva, V. Yu., & Matushkin, Yu. G. (2021). Principal Components of Genetic Sequences: Correlations and Significance. In Mathematical Biology and Bioinformatics (Vol. 16, Issue 2, pp. 299–316). Institute of Mathematical Problems of Biology of RAS (IMPB RAS). https://doi.org/10.17537/2021.16.299
Esumi, G. (2023). The Nucleic Acid Sequences of the Genome Are Highly Structured on a Genome-Wide Scale in Terms of Nucleic Acid Composition Indices Such as TA Skew and GC Skew. Jxiv. https://doi.org/10.51094/jxiv.436
Crick, F. H. C. (1968). The origin of the genetic code. In Journal of Molecular Biology (Vol. 38, Issue 3, pp. 367–379). Elsevier BV. https://doi.org/10.1016/0022-2836(68)90392-6
Haig, D., & Hurst, L. D. (1991). A quantitative measure of error minimization in the genetic code. Journal of Molecular Evolution, 33(5), 412–417. https://doi.org/10.1007/BF02103132
Haig, D., & Hurst, L. D. (1991). A quantitative measure of error minimization in the genetic code. In Journal of Molecular Evolution (Vol. 33, Issue 5, pp. 412–417). Springer Science and Business Media LLC. https://doi.org/10.1007/bf02103132
Seki, M. (2023). On the origin of the genetic code. In Genes & Genetic Systems (Vol. 98, Issue 1, pp. 9–24). Genetics Society of Japan. https://doi.org/10.1266/ggs.22-00085
Tourancheau, A. B., Tsao, N., Klobutcher, L. A., Pearlman, R. E., & Adoutte, A. (1995). Genetic code deviations in the ciliates: evidence for multiple and independent events. The EMBO Journal, 14(13), 3262–3267. https://doi.org/10.1002/j.1460-2075.1995.tb07329.x
Downloads
Posted
Submitted: 2023-10-20 05:35:01 UTC
Published: 2023-10-25 23:27:30 UTC
License
Copyright (c) 2023
Genshiro Esumi
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.