The Nucleic Acid Sequences of the Genome Are Highly Structured on a Genome-Wide Scale in Terms of Nucleic Acid Composition Indices Such as TA Skew and GC Skew
DOI:
https://doi.org/10.51094/jxiv.436Keywords:
genome structure, TA skew, GC skew, GC content, symmetryAbstract
Nucleic acid sequences in the genome are often assumed to be random because their mutations are thought to occur randomly. In fact, several analyses have shown that the sequences have a certain structural nature. However, the role of these structures in biological activity has rarely been reported.
In a previous report, I showed that the high and low TA skew of the gene correspond to the transmembrane domains and the intrinsically disordered regions of the protein, respectively. Therefore, in this paper, I examined the variation of TA skew in combination with other nucleic acid composition indices, GC skew and GC content, by one base pair step over the entire genome sequence length at the 1000 base pair sequence windows, which is close to the size of most genes.
In this study, these three indices were calculated for the genomes of the three bacterial species and the genome sequence of human chromosome 1. The results showed that the distribution of GC content was different for each species, but the distributions of TA skew and GC skew were positively and negatively symmetric with zero at the center. In addition, scatter plots of the two indices, TA skew and GC skew, showed a rotationally symmetric distribution in each species.
It has been previously reported that the numbers of T and A and the numbers of G and C in the nucleic acid composition of the genome are almost equal in sequences above a certain length, and this is called Chargaff's second parity rule. However, there has been no report on the correlated behavior of TA skew and GC skew on the genome, and this is the first report on this phenomenon.
The nucleic acid sequences on the genome, which are often thought to be random, are actually highly structured on a genome-wide scale in terms of nucleic acid composition. And I speculated that organisms exploit the cooperative association between this structural nature of the genome and the functional assignment of the genetic code to achieve functional protein synthesis. However, how such large-scale genome structures are built and maintained remains a mystery.
Conflicts of Interest Disclosure
The author declare no conflicts of interest associated with this manuscript.Downloads *Displays the aggregated results up to the previous day.
References
Gautier, C. (2000). Compositional bias in DNA. Current Opinion in Genetics & Development, 10(6), 656–661. https://doi.org/10.1016/S0959-437X(00)00144-1
Esumi, G. (2023). The α-helical transmembrane domains and intrinsically disordered regions on the human proteins are coded for by the skews of their genes' nucleic acid composition with the "universal" assignment of the genetic code table. Jxiv. https://doi.org/10.51094/jxiv.247
Anaeromyxobacter dehalogenans 2CP-C, complete genome, GenBank: CP000251.1, NCBI website, https://www.ncbi.nlm.nih.gov/nuccore/CP000251
Escherichia coli str. K-12 substr. MG1655, complete genome, NCBI Reference Sequence: NC_000913.3, NCBI website, https://www.ncbi.nlm.nih.gov/nuccore/556503834
Candidatus Zinderia insecticola CARI, complete genome, NCBI Reference Sequence: NC_014497.1, NCBI website, https://www.ncbi.nlm.nih.gov/nuccore/NC_014497.1
Genome dataset, “Homo sapiens (human) / Genome assembly T2T-CHM13v2.0", NCBI website, https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_009914755.1/
Almpanis, A., Swain, M., Gatherer, D., & McEwan, N. (2018). Correlation between bacterial G+C content, genome size and the G+C content of associated plasmids and bacteriophages. Microbial Genomics, 4(4). https://doi.org/10.1099/mgen.0.000168
Rudner, R., Karkas, J. D., & Chargaff, E. (1968). Separation of B. subtilis DNA into complementary strands. 3. Direct analysis. Proceedings of the National Academy of Sciences, 60(3), 921–922. https://doi.org/10.1073/pnas.60.3.921
Lobry, J. R. (1996). Asymmetric substitution patterns in the two DNA strands of bacteria. Molecular Biology and Evolution, 13(5), 660–665. https://doi.org/10.1093/oxfordjournals.molbev.a025626
Esumi, G. (2022). Synonymous codon usage and its bias in the bacterial proteomes primarily offset guanine and cytosine content variation to maintain optimal amino acid compositions. Jxiv. https://doi.org/10.51094/jxiv.99
Downloads
Posted
Submitted: 2023-07-01 23:48:46 UTC
Published: 2023-07-04 04:47:06 UTC
License
Copyright (c) 2023
Genshiro Esumi
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.