Preprint / Version 1

Statistical Extremes of Amino Acid Residue Composition of the Proteome Proteins Can Explain the Origin of the Universality of the Genetic Code

##article.authors##

DOI:

https://doi.org/10.51094/jxiv.575

Keywords:

amino acid composition, transmembrane domain, intrinsically disordered region, genetic code, optimized translation theory, Chargaff’s second parity rule, GC content, TA skew, GC skew

Abstract

Organisms have evolved and diverged from a common ancestor, and today there are many different species in many different environments. Because these organisms share a nearly identical genetic code, it is believed that all species have changed little in their genetic code from that of the ancestor over the course of evolution. However, the reasons for this universality, why almost all organisms have never changed their genetic code, are not well understood.

In the present study, principal component analyses of the amino acid residue composition of proteome proteins from different species revealed that proteins with high amounts of transmembrane domains (TMDs) and proteins with high amounts of intrinsically disordered regions (IDRs) almost universally occupy the two extremes of each proteome plot of their first and second principal components. These TMD- and IDR-rich proteins correlated not only with the amino acid composition of the proteins, but also with the nucleic acid composition of their corresponding genes.

In my previous report, I showed that the genetic code itself has a structure that can assist the generation of TMDs and IDRs by exploiting the partial biases of nucleic acid composition in gene sequences. With the current statistical analyses, I also showed that TMD- and IDR-rich proteins always occupy the statistical extremes of amino acid composition in the proteomes of different organisms. If TMDs and IDRs are always the two largest domains/regions with extreme amino acid composition in the proteome, and if the genetic code has a structure that helps synthesize TMDs and IDRs, then I can conclude that the structure of the current genetic code may have been chosen to meet the requirements of the typical amino acid composition of these functional domains. If this assumption is true, it would be reasonable to assume that such a genetic code has become universal.

This is a new explanation for the universality of the genetic code, and I call it "The Optimized Translation Theory". This theory should provide a partial explanation for the origin of the standard genetic code in terms of its functions.

Conflicts of Interest Disclosure

The author declare no conflicts of interest associated with this manuscript.

Downloads *Displays the aggregated results up to the previous day.

Download data is not yet available.

References

Crick, F. H. C. (1968). The origin of the genetic code. In Journal of Molecular Biology (Vol. 38, Issue 3, pp. 367–379). Elsevier BV. https://doi.org/10.1016/0022-2836(68)90392-6

Hamashima, K., & Kanai, A. (2013). Alternative genetic code for amino acids and transfer RNA revisited. In BioMolecular Concepts (Vol. 4, Issue 3, pp. 309–318). Walter de Gruyter GmbH. https://doi.org/10.1515/bmc-2013-0002

Kun, Á., & Radványi, Á. (2018). The evolution of the genetic code: Impasses and challenges. In Biosystems (Vol. 164, pp. 217–225). Elsevier BV. https://doi.org/10.1016/j.biosystems.2017.10.006

Seki, M. (2023). On the origin of the genetic code. In Genes & Genetic Systems (Vol. 98, Issue 1, pp. 9–24). Genetics Society of Japan. https://doi.org/10.1266/ggs.22-00085

Esumi, G. (2023). The standard genetic code is designed to generate transmembrane domains and intrinsically disordered regions as projections of the thymine density on the gene. Jxiv. https://doi.org/10.51094/jxiv.533

Esumi, G. (2023). The TA Skew of a Gene Primarily Determines the Type of Protein, Such as Membrane Protein or Intrinsically Disordered Protein. Jxiv. https://doi.org/10.51094/jxiv.446

"Quest for Orthologs" group. (2023) Reference proteomes - Primary proteome sets for the Quest For Orthologs, RELEASE 2023_03. https://www.ebi.ac.uk/reference_proteomes/ Accessed 1 Sep 2023

Bateman, A., Martin, M.-J., Orchard, S., Magrane, M., Ahmad, S., Alpi, E., Bowler-Barnett, E. H., Britto, R., Bye-A-Jee, H., Cukura, A., Denny, P., Dogan, T., Ebenezer, T., Fan, J., Garmiri, P., da Costa Gonzales, L. J., Hatton-Ellis, E., Hussein, A., … Zhang, J. (2022). UniProt: the Universal Protein Knowledgebase in 2023. In Nucleic Acids Research (Vol. 51, Issue D1, pp. D523–D531). Oxford University Press (OUP). https://doi.org/10.1093/nar/gkac1052 Accessed 1 Sep 2023

Esumi, G. (2023). The Synonymous Codon Usage of a Protein Gene Is Primarily Determined by the Guanine + Cytosine Content of the Individual Gene Rather Than the Species to Which It Belongs To Synthesize Proteins With a Balanced Amino Acid Composition. Jxiv. https://doi.org/10.51094/jxiv.561

Nirenberg, M. W., & Matthaei, J. H. (1961). The dependence of cell-free protein synthesis in E. coli upon naturally occurring or synthetic polyribonucleotides. In Proceedings of the National Academy of Sciences (Vol. 47, Issue 10, pp. 1588–1602). Proceedings of the National Academy of Sciences. https://doi.org/10.1073/pnas.47.10.1588

Fariselli, P., Taccioli, C., Pagani, L., & Maritan, A. (2020). DNA sequence symmetries from randomness: the origin of the Chargaff’s second parity rule. In Briefings in Bioinformatics (Vol. 22, Issue 2, pp. 2172–2181). Oxford University Press (OUP). https://doi.org/10.1093/bib/bbaa041

Esumi, G. (2023). The Nucleic Acid Sequences of the Genome Are Highly Structured on a Genome-Wide Scale in Terms of Nucleic Acid Composition Indices Such as TA Skew and GC Skew. Jxiv. https://doi.org/10.51094/jxiv.436

Osawa, S., Ohama, T., Jukes, T. H., & Watanabe, K. (1989). Evolution of the mitochondrial genetic code I. Origin of AGR serine and stop codons in metazoan mitochondria. In Journal of Molecular Evolution (Vol. 29, Issue 3, pp. 202–207). Springer Science and Business Media LLC. https://doi.org/10.1007/bf02100203

Nikolaou, C., & Almirantis, Y. (2006). Deviations from Chargaff’s second parity rule in organellar DNA. In Gene (Vol. 381, pp. 34–41). Elsevier BV. https://doi.org/10.1016/j.gene.2006.06.010

Downloads

Posted


Submitted: 2023-12-17 20:56:15 UTC

Published: 2023-12-22 04:20:16 UTC
Section
Biology, Life Sciences & Basic Medicine