The Nucleic Acid Sequences of the Genome Are Highly Structured on a Genome-Wide Scale in Terms of Nucleic Acid Composition Indices Such as TA Skew and GC Skew

Esumi, Genshiro

doi:10.51094/jxiv.436

##article.authors##

Esumi, Genshiro Department of Pediatric Surgery, Hospital of the University of Occupational and Environmental Health https://orcid.org/0000-0003-0618-9943 https://researchmap.jp/esumig

DOI:

https://doi.org/10.51094/jxiv.436

キーワード:

genome structure、 TA skew、 GC skew、 GC content、 symmetry

抄録

Nucleic acid sequences in the genome are often assumed to be random because their mutations are thought to occur randomly. In fact, several analyses have shown that the sequences have a certain structural nature. However, the role of these structures in biological activity has rarely been reported.

In a previous report, I showed that the high and low TA skew of the gene correspond to the transmembrane domains and the intrinsically disordered regions of the protein, respectively. Therefore, in this paper, I examined the variation of TA skew in combination with other nucleic acid composition indices, GC skew and GC content, by one base pair step over the entire genome sequence length at the 1000 base pair sequence windows, which is close to the size of most genes.

In this study, these three indices were calculated for the genomes of the three bacterial species and the genome sequence of human chromosome 1. The results showed that the distribution of GC content was different for each species, but the distributions of TA skew and GC skew were positively and negatively symmetric with zero at the center. In addition, scatter plots of the two indices, TA skew and GC skew, showed a rotationally symmetric distribution in each species.

It has been previously reported that the numbers of T and A and the numbers of G and C in the nucleic acid composition of the genome are almost equal in sequences above a certain length, and this is called Chargaff's second parity rule. However, there has been no report on the correlated behavior of TA skew and GC skew on the genome, and this is the first report on this phenomenon.

The nucleic acid sequences on the genome, which are often thought to be random, are actually highly structured on a genome-wide scale in terms of nucleic acid composition. And I speculated that organisms exploit the cooperative association between this structural nature of the genome and the functional assignment of the genetic code to achieve functional protein synthesis. However, how such large-scale genome structures are built and maintained remains a mystery.

利益相反に関する開示

The author declare no conflicts of interest associated with this manuscript.

ダウンロード *前日までの集計結果を表示します

ダウンロード実績データは、公開の翌日以降に作成されます。

引用文献

Gautier, C. (2000). Compositional bias in DNA. Current Opinion in Genetics & Development, 10(6), 656–661. https://doi.org/10.1016/S0959-437X(00)00144-1

Esumi, G. (2023). The α-helical transmembrane domains and intrinsically disordered regions on the human proteins are coded for by the skews of their genes' nucleic acid composition with the "universal" assignment of the genetic code table. Jxiv. https://doi.org/10.51094/jxiv.247

Anaeromyxobacter dehalogenans 2CP-C, complete genome, GenBank: CP000251.1, NCBI website, https://www.ncbi.nlm.nih.gov/nuccore/CP000251

Escherichia coli str. K-12 substr. MG1655, complete genome, NCBI Reference Sequence: NC_000913.3, NCBI website, https://www.ncbi.nlm.nih.gov/nuccore/556503834

Candidatus Zinderia insecticola CARI, complete genome, NCBI Reference Sequence: NC_014497.1, NCBI website, https://www.ncbi.nlm.nih.gov/nuccore/NC_014497.1

Genome dataset, “Homo sapiens (human) / Genome assembly T2T-CHM13v2.0", NCBI website, https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_009914755.1/

Almpanis, A., Swain, M., Gatherer, D., & McEwan, N. (2018). Correlation between bacterial G+C content, genome size and the G+C content of associated plasmids and bacteriophages. Microbial Genomics, 4(4). https://doi.org/10.1099/mgen.0.000168

Rudner, R., Karkas, J. D., & Chargaff, E. (1968). Separation of B. subtilis DNA into complementary strands. 3. Direct analysis. Proceedings of the National Academy of Sciences, 60(3), 921–922. https://doi.org/10.1073/pnas.60.3.921

Lobry, J. R. (1996). Asymmetric substitution patterns in the two DNA strands of bacteria. Molecular Biology and Evolution, 13(5), 660–665. https://doi.org/10.1093/oxfordjournals.molbev.a025626

Esumi, G. (2022). Synonymous codon usage and its bias in the bacterial proteomes primarily offset guanine and cytosine content variation to maintain optimal amino acid compositions. Jxiv. https://doi.org/10.51094/jxiv.99