Preprint / Version 1

Synthesis assistance of transmembrane domains is a fundamental function of the genetic code table assignment

##article.authors##

DOI:

https://doi.org/10.51094/jxiv.139

Keywords:

genetic code table, transmembrane domains, amino acid composition, nucleic acid composition, principal component analysis

Abstract

In the genetic code table, nucleic acid sequences on genes correspond to amino acid sequences on proteins. While the genetic code table assignment is common among almost all current organisms, there has never been a clear explanation for its reason. This paper presents three examinations to provide some background on this issue.

First, principal component analysis of the amino acid compositions of all human proteins showed that membrane proteins with two or more transmembrane domains were separated from those without, dominantly by their second principal component and partially by their first principal component. Their eigenvectors indicated that these membrane proteins tend to contain some amino acids in higher amounts. These are phenylalanine, tyrosine, isoleucine, methionine, tryptophan, cystine, valine, and leucine. Many of them are hydrophobic. Moreover, they all correspond to codons containing uracil in their first or second letter in the genetic code table.

Second, principal component analysis of the nucleic acid composition of the genes for all human proteins revealed that the first and second principal components of the nucleic acid compositions strongly correlated with the above first and second principal components of the amino acid compositions, respectively. Furthermore, I found two correlations. First, the first principal component of the nucleic acid compositions strongly correlates with their GC (guanine and cytosine) content. The second is that the second principal component strongly correlates with the ratio of thymine to adenine + thymine, respectively.

Third, making a scatter plot of all human protein genes by the ratio of thymine to adenine + thymine and their GC content, I found that membrane proteins strongly correlate with genes with more thymine than adenine, regardless of their GC content.

Membrane proteins account for 30% of all proteins, and their sound synthesis must be one of the highest priority issues of life. Meanwhile, these membrane proteins contain transmembrane domains consisting of 20~ consecutive hydrophobic amino acid residue structures. We have believed that organisms must achieve their proteins by accumulating random mutations during their evolution. However, the results of this analysis indicate that even random genetic mutations can make a much more efficient generation of amino acid compositions that form transmembrane domains only by using gene segments with more thymine than adenine. These results suggest that organisms use the assignment characteristics of the genetic code table to synthesize their transmembrane domains.

The synthesis assistance of transmembrane domains could be a fundamental function of the current genetic code table. Therefore, I assume that the current genetic code tables have continued to be selected by such beneficial functions and are in a state of convergence.

Downloads *Displays the aggregated results up to the previous day.

Download data is not yet available.

References

Wnętrzak, M., Błażej, P., & Mackiewicz, P. (2019). Optimization of the standard genetic code in terms of two mutation types: Point mutations and frameshifts. Bio Systems, 181, 44–50. https://doi.org/10.1016/j.biosystems.2019.04.012

江角 元史郎. (2022). 必須アミノ酸の起源 細胞外マトリックス仮説. Jxiv. https://doi.org/10.51094/jxiv.121

National Center for Biotechnology Information (NCBI). (2022). Genome assembly GRCh38.p14. National Library of Medicine (NIH) website. https://www.ncbi.nlm.nih.gov/data-hub/genome/GCF_000001405.40/

UniProt consortium. (2022). Homo sapiens (species) Related UniProtKB entries. UniProt website. https://www.uniprot.org/uniprotkb?query=(taxonomy_id:9606)

太田 亨, 新井 宏嘉. (2006). 組成データ解析の問題点とその解決方法. 地質学誌, 112(3), 173-187. https://doi.org/10.5575/geosoc.112.173

Vakirlis, N., Acar, O., Hsu, B., Castilho Coelho, N., van Oss, S. B., Wacholder, A., Medetgul-Ernar, K., Bowman, R. W., Hines, C. P., Iannotta, J., Parikh, S. B., McLysaght, A., Camacho, C. J., O’Donnell, A. F., Ideker, T., & Carvunis, A.-R. (2020). De novo emergence of adaptive membrane proteins from thymine-rich genomic sequences. Nature Communications, 11(1), 781. https://doi.org/10.1038/s41467-020-14500-z

Fariselli, P., Taccioli, C., Pagani, L., & Maritan, A. (2021). DNA sequence symmetries from randomness: The origin of the Chargaff’s second parity rule. Briefings in Bioinformatics, 22(2), 2172–2181. https://doi.org/10.1093/bib/bbaa041

Esumi, G. (2022). Synonymous codon usage and its bias in the bacterial proteomes primarily offset GC content variation to maintain optimal amino acid compositions. Jxiv. https://doi.org/10.51094/jxiv.99

Wichmann, S., & Ardern, Z. (2019). Optimality in the standard genetic code is robust with respect to comparison code sets. Bio Systems, 185, 104023. https://doi.org/10.1016/j.biosystems.2019.104023

Posted


Submitted: 2022-08-12 00:47:57 UTC

Published: 2022-08-16 01:52:59 UTC
Section
Biology, Life Sciences & Basic Medicine