Preprint / Version 1

The Standard Genetic Code Predominantly Assigns Uracil-Containing Codons to Amino Acids Enriched in Transmembrane Domains and Uracil-Free Codons to Amino Acids Enriched in Intrinsically Disordered Regions

##article.authors##

DOI:

https://doi.org/10.51094/jxiv.592

Keywords:

genetic code, transmembrane domains, intrinsically disordered regions, purpose, origin

Abstract

All organisms on Earth share a nearly identical genetic code, and the most typical one is called the standard genetic code. In previous research, based on the results of studies of possible inverse translation of the genetic code applied to various protein amino acid sequences, I proposed the idea that the genetic code uses local thymine density in the gene sequence to determine the presence of transmembrane domains (TMDs) or intrinsically disordered regions (IDRs) on proteins. However, I had not performed an analysis to determine how each specific codon-amino acid correspondence supported this hypothesis.

In this study, I examined the specific difference in the amino acid composition of TMDs and IDRs of different organisms by comparing the ratios between the average amino acid residue compositions of TMDs, IDRs, and the total sequence of each protein by organism. The results showed that the difference ratios between TMDs to total, IDRs to total for all 20 amino acids were almost inversely different between two regions and were well consistent across organisms. This consistency suggests that, regardless of species, TMDs and IDRs each have distinct characteristics in their amino acid composition. Furthermore, a comparison of these results with the codons corresponding to each amino acid in the genetic code revealed that the standard genetic code predominantly assigns uracil-containing codons to amino acids enriched in transmembrane domains and uracil-free codons to amino acids enriched in intrinsically disordered regions.

In my other recent study, I showed that TMD-rich and IDR-rich proteins are consistently two of the most statistically distinct domains/regions of amino acid composition of the proteome in any organism, and combined with the previous research finding that the genetic code has a structure in which TMDs and IDRs are encoded by gene sequences of each specific nucleotide composition, I concluded that this may explain why the standard genetic code is universal. The results of the current study show that the differentiation function of the genetic code is based on an elaborate simultaneous coordination of codon-amino acid correspondence. This finding supports the idea that the structure of the standard genetic code, which is influenced by the commonality of TMDs/IDRs, is unlikely to be a product of mere chance and at least has a purpose in differentiating these regions. This finding should provide a crucial insight into the undiscovered origins of the standard genetic code as the statistically largest piece of its puzzle. But at the same time, the piece must be quite small in the over-complexity of its overall mystery.

Conflicts of Interest Disclosure

The author declare no conflicts of interest associated with this manuscript.

Downloads *Displays the aggregated results up to the previous day.

Download data is not yet available.

References

Crick, F. H. C. (1968). The origin of the genetic code. In Journal of Molecular Biology (Vol. 38, Issue 3, pp. 367–379). Elsevier BV. https://doi.org/10.1016/0022-2836(68)90392-6

Koonin, E. V., & Novozhilov, A. S. (2008). Origin and evolution of the genetic code: The universal enigma. In IUBMB Life (Vol. 61, Issue 2, pp. 99–111). Wiley. https://doi.org/10.1002/iub.146

Wnętrzak, M., Błażej, P., Mackiewicz, D., & Mackiewicz, P. (2018). The optimality of the standard genetic code assessed by an eight-objective evolutionary algorithm. In BMC Evolutionary Biology (Vol. 18, Issue 1). Springer Science and Business Media LLC. https://doi.org/10.1186/s12862-018-1304-0

Esumi, G. (2023). The standard genetic code is designed to generate transmembrane domains and intrinsically disordered regions as projections of the thymine density on the gene. Jxiv. https://doi.org/10.51094/jxiv.533

"Quest for Orthologs" group. (2023) Reference proteomes - Primary proteome sets for the Quest For Orthologs, RELEASE 2023_03. https://www.ebi.ac.uk/reference_proteomes/ Accessed 1 Sep 2023

Bateman, A., Martin, M.-J., Orchard, S., Magrane, M., Ahmad, S., Alpi, E., Bowler-Barnett, E. H., Britto, R., Bye-A-Jee, H., Cukura, A., Denny, P., Dogan, T., Ebenezer, T., Fan, J., Garmiri, P., da Costa Gonzales, L. J., Hatton-Ellis, E., Hussein, A., … Zhang, J. (2022). UniProt: the Universal Protein Knowledgebase in 2023. In Nucleic Acids Research (Vol. 51, Issue D1, pp. D523–D531). Oxford University Press (OUP). https://doi.org/10.1093/nar/gkac1052 Accessed 1 Sep 2023

Hamashima, K., & Kanai, A. (2013). Alternative genetic code for amino acids and transfer RNA revisited. In BioMolecular Concepts (Vol. 4, Issue 3, pp. 309–318). Walter de Gruyter GmbH. https://doi.org/10.1515/bmc-2013-0002

Osawa, S., Ohama, T., Jukes, T. H., & Watanabe, K. (1989). Evolution of the mitochondrial genetic code I. Origin of AGR serine and stop codons in metazoan mitochondria. In Journal of Molecular Evolution (Vol. 29, Issue 3, pp. 202–207). Springer Science and Business Media LLC. https://doi.org/10.1007/bf02100203 Hamashima, K., & Kanai, A. (2013). Alternative genetic code for amino acids and transfer RNA revisited. In BioMolecular Concepts (Vol. 4, Issue 3, pp. 309–318). Walter de Gruyter GmbH. https://doi.org/10.1515/bmc-2013-0002

Esumi, G. (2023). Statistical Extremes of Amino Acid Residue Composition of the Proteome Proteins Can Explain the Origin of the Universality of the Genetic Code. Jxiv. https://doi.org/10.51094/jxiv.575

Prilusky, J., & Bibi, E. (2009). Studying membrane proteins through the eyes of the genetic code revealed a strong uracil bias in their coding mRNAs. In Proceedings of the National Academy of Sciences (Vol. 106, Issue 16, pp. 6662–6666). Proceedings of the National Academy of Sciences. https://doi.org/10.1073/pnas.0902029106

Vakirlis, N., Acar, O., Hsu, B., Castilho Coelho, N., Van Oss, S. B., Wacholder, A., Medetgul-Ernar, K., Bowman, R. W., II, Hines, C. P., Iannotta, J., Parikh, S. B., McLysaght, A., Camacho, C. J., O’Donnell, A. F., Ideker, T., & Carvunis, A.-R. (2020). De novo emergence of adaptive membrane proteins from thymine-rich genomic sequences. In Nature Communications (Vol. 11, Issue 1). Springer Science and Business Media LLC. https://doi.org/10.1038/s41467-020-14500-z

Esumi, G. (2023). The α-helical transmembrane domains and intrinsically disordered regions on the human proteins are coded for by the skews of their genes' nucleic acid composition with the "universal" assignment of the genetic code table. Jxiv. https://doi.org/10.51094/jxiv.247

Esumi, G. (2023). The TA Skew of a Gene Primarily Determines the Type of Protein, Such as Membrane Protein or Intrinsically Disordered Protein. Jxiv. https://doi.org/10.51094/jxiv.446

Downloads

Posted


Submitted: 2024-01-05 05:49:44 UTC

Published: 2024-01-10 09:00:15 UTC
Section
Biology, Life Sciences & Basic Medicine