The Gene’s GC Content Is the Greatest Source of Inter-Species Differences in Protein Amino Acid Composition
DOI:
https://doi.org/10.51094/jxiv.1061キーワード:
Amino acid composition、 GC content、 TA skew、 Species Difference、 Diversity抄録
Organisms synthesize proteins based on sequences of 20 amino acids specified by their genes, and protein function is determined by these amino acid sequences and compositions. Previous studies in Bacteria have shown that an organism’s genomic GC content is a key determinant of the amino acid composition of its proteins. However, a more generalized behavior that includes organisms from other domains of life has remained unclear.
In this study, I performed principal component analysis (PCA) on the amino acid compositions of approximately 1.5 million proteins from 81 species spanning all three domains of life and examined how their principal component scores varied among species. The results revealed that, while the first principal component exhibited considerable variation among species, the variation in all other principal components was significantly limited.
To investigate this further, I developed a function to back-calculate the GC content of a gene from its amino acid composition under the assumption of equal usage of synonymous codons. I then compared the estimated GC content derived from this reverse transformation with the first principal component from the PCA, observing a correlation coefficient of 0.98, which indicates an almost perfect match. Because the first principal component of amino acid composition was essentially the only component that showed substantial interspecies variation, and its values strongly correlated with the back-calculated GC content, I conclude that the greatest source of diversity in protein amino acid composition lies in the gene’s GC content, which is substantially governed by the organism’s genomic GC content.
利益相反に関する開示
No competing interests are declared.ダウンロード *前日までの集計結果を表示します
引用文献
Du, M.-Z., Zhang, C., Wang, H., Liu, S., Wei, W., & Guo, F.-B. (2018). The GC Content as a Main Factor Shaping the Amino Acid Usage During Bacterial Evolution Process. Frontiers Media SA. https://doi.org/10.3389/fmicb.2018.02948
EMBL-EBI. (2024). Reference Proteomes (Release 2024_02) [Database]. Retrieved January 7, 2025, from https://www.ebi.ac.uk/reference_proteomes/
Esumi, G. (2023). Statistical Extremes of Amino Acid Residue Composition of the Proteome Proteins Can Explain the Origin of the Universality of the Genetic Code. Jxiv. https://doi.org/10.51094/jxiv.575
Esumi, G. (2023). The Synonymous Codon Usage of a Protein Gene Is Primarily Determined by the Guanine + Cytosine Content of the Individual Gene Rather Than the Species to Which It Belongs To Synthesize Proteins With a Balanced Amino Acid Composition. Jxiv. https://doi.org/10.51094/jxiv.561
ダウンロード
公開済
投稿日時: 2025-01-27 00:32:42 UTC
公開日時: 2025-01-28 10:19:23 UTC
ライセンス
Copyright(c)2025
Genshiro Esumi
この作品は、Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Licenseの下でライセンスされています。