Preprint / Version 1

The α-helical transmembrane domains and intrinsically disordered regions on the human proteins are coded for by the skews of their genes' nucleic acid composition with the "universal" assignment of the genetic code table

##article.authors##

DOI:

https://doi.org/10.51094/jxiv.247

Keywords:

transmembrane domains, intrinsically disordered regions, nucleic acid composition, TA skew, GC content, genetic code table, Chargaff's second parity rule

Abstract

It is widely known that in living organisms, nucleic acid sequences of genes are translated into amino acid sequences according to the assignment of the genetic code table, resulting in the synthesis of proteins with various functions. It is also well known that the function of an individual protein is greatly influenced by its amino acid composition. On the other hand, there are few reports on how the gene's nucleic acid composition affects the protein's function.

In this study, using publicly available data from the UniProt and NCBI, the annotation of each human protein and its gene's cording sequence was matched by the RefSeq IDs. As a result, 25095 proteins were matched. First, I calculated each protein's fractions of α-helical transmembrane domains on their sequences and that of intrinsically disordered regions. Second, I made a scatter plot by each gene's GC content and TA skew. Third, I compared these plots and fractions. The plots show that proteins with higher fractions of α-helix transmembrane domains occupy the area with higher TA skew. On the other hand, the proteins with higher fractions of intrinsically disordered regions occupy both the lower TA skew area and partially higher GC content area.

Hydrophobic and hydrophilic amino acids cluster in the genetic code table, historically explained by "robustness to mutations." However, in the actual assignment of the genetic code table, codons containing T in the first or the second letters corresponded to amino acids that mainly constitute the alpha-helical transmembrane domains. In contrast, codons without T in the first and second letters corresponded to amino acids characteristic in the intrinsically disordered regions. The plot separation of the two types of proteins in this study was speculated to originate from the assignment of the "universal" genetic code table.

Chargaff's second parity rule (CSPR) says that even in the single DNA strand, both the number of Thymine (T) and Adenine (A) and the numbers of Guanine (G) and Cytosine (C) in a genome sequence are almost identical if the sequence is sufficiently long. On the other hand, the present study showed that the numbers of T and A differ in many protein genes and that their skew bias differentiates the α-helix transmembrane domains and the intrinsically disordered regions. The origin of CSPR has been a mystery in bioscience history. However, suppose the TA skews of the genes determine the functions of the proteins. In that case, CSPR might have been maintained to keep the proper proportion of protein functions in the proteome. This assumption could support the theory that CSPR is one of the backgrounds the genome must follow to keep functional proteomes.

The result of this study might indicate that all organisms might universally control the proportions of functional domains in their proteomes by using the universal genetic code table assignment and their non-random, precisely structured genome sequences.

Conflicts of Interest Disclosure

The author declare no conflicts of interest associated with this manuscript.

Downloads *Displays the aggregated results up to the previous day.

Download data is not yet available.

References

B. Alberts他著,中村桂子,松原謙一監訳,『細胞の分子生物学 第6版』(ニュートンプレス,2017).

Carugo, O. (2008). Amino acid composition and protein dimension. Protein Science, 17(12), 2187–2191. https://doi.org/10.1110/ps.037762.108

江角 元史郎. (2022). 膜貫通ドメイン合成支援は遺伝暗号表配列の重要な機能である. Jxiv. https://doi.org/10.51094/jxiv.139

National Center for Biotechnology Information (NCBI). (2022). Genome assembly T2T-CHM13v2.0. National Library of Medicine (NIH) website. https://www.ncbi.nlm.nih.gov/data-hub/genome/GCF_009914755.1/

UniProt consortium. (2023). Proteomes · Homo sapiens (Human). UniProt website. https://www.uniprot.org/proteomes/UP000005640

Radványi, Á., & Kun, Á. (2021). The Mutational Robustness of the Genetic Code and Codon Usage in Environmental Context: A Non-Extremophilic Preference? Life, 11(8), 773. https://doi.org/10.3390/life11080773

Rudner, R., Karkas, J. D., & Chargaff, E. (1968). Separation of B. subtilis DNA into complementary strands. 3. Direct analysis. Proceedings of the National Academy of Sciences, 60(3), 921–922. https://doi.org/10.1073/pnas.60.3.921

Fariselli, P., Taccioli, C., Pagani, L., & Maritan, A. (2021). DNA sequence symmetries from randomness: the origin of the Chargaff’s second parity rule. Briefings in Bioinformatics, 22(2), 2172–2181. https://doi.org/10.1093/bib/bbaa041

Posted


Submitted: 2023-01-16 01:13:30 UTC

Published: 2023-01-23 08:45:46 UTC
Section
Biology, Life Sciences & Basic Medicine