プレプリント / バージョン1

The Gene’s GC Content Is the Greatest Source of Inter-Species Differences in Protein Amino Acid Composition

##article.authors##

DOI:

https://doi.org/10.51094/jxiv.1061

キーワード:

Amino acid composition、 GC content、 TA skew、 Species Difference、 Diversity

抄録

Organisms synthesize proteins based on sequences of 20 amino acids specified by their genes, and protein function is determined by these amino acid sequences and compositions. Previous studies in Bacteria have shown that an organism’s genomic GC content is a key determinant of the amino acid composition of its proteins. However, a more generalized behavior that includes organisms from other domains of life has remained unclear.

In this study, I performed principal component analysis (PCA) on the amino acid compositions of approximately 1.5 million proteins from 81 species spanning all three domains of life and examined how their principal component scores varied among species. The results revealed that, while the first principal component exhibited considerable variation among species, the variation in all other principal components was significantly limited.

To investigate this further, I developed a function to back-calculate the GC content of a gene from its amino acid composition under the assumption of equal usage of synonymous codons. I then compared the estimated GC content derived from this reverse transformation with the first principal component from the PCA, observing a correlation coefficient of 0.98, which indicates an almost perfect match. Because the first principal component of amino acid composition was essentially the only component that showed substantial interspecies variation, and its values strongly correlated with the back-calculated GC content, I conclude that the greatest source of diversity in protein amino acid composition lies in the gene’s GC content, which is substantially governed by the organism’s genomic GC content.

利益相反に関する開示

No competing interests are declared.

ダウンロード *前日までの集計結果を表示します

ダウンロード実績データは、公開の翌日以降に作成されます。

引用文献

Du, M.-Z., Zhang, C., Wang, H., Liu, S., Wei, W., & Guo, F.-B. (2018). The GC Content as a Main Factor Shaping the Amino Acid Usage During Bacterial Evolution Process. Frontiers Media SA. https://doi.org/10.3389/fmicb.2018.02948

EMBL-EBI. (2024). Reference Proteomes (Release 2024_02) [Database]. Retrieved January 7, 2025, from https://www.ebi.ac.uk/reference_proteomes/

Esumi, G. (2023). Statistical Extremes of Amino Acid Residue Composition of the Proteome Proteins Can Explain the Origin of the Universality of the Genetic Code. Jxiv. https://doi.org/10.51094/jxiv.575

Esumi, G. (2023). The Synonymous Codon Usage of a Protein Gene Is Primarily Determined by the Guanine + Cytosine Content of the Individual Gene Rather Than the Species to Which It Belongs To Synthesize Proteins With a Balanced Amino Acid Composition. Jxiv. https://doi.org/10.51094/jxiv.561

ダウンロード

公開済


投稿日時: 2025-01-27 00:32:42 UTC

公開日時: 2025-01-28 10:19:23 UTC
研究分野
生物学・生命科学・基礎医学