プレプリント / バージョン1

GWASLab: a Python package for processing and visualizing GWAS summary statistics

##article.authors##

  • He, Yunye Laboratory of Complex Trait Genomics, Graduate School of Frontier Sciences, The University of Tokyo https://orcid.org/0000-0001-8581-7826
  • Masaru Koido Laboratory of Complex Trait Genomics, Graduate School of Frontier Sciences, The University of Tokyo https://orcid.org/0000-0002-0348-0666 https://researchmap.jp/mkoido
  • Yuka Shimmori Laboratory of Complex Trait Genomics, Graduate School of Frontier Sciences, The University of Tokyo
  • Yoichiro Kamatani Laboratory of Complex Trait Genomics, Graduate School of Frontier Sciences, The University of Tokyo

DOI:

https://doi.org/10.51094/jxiv.370

キーワード:

GWAS、 Python、 summary statistics、 visualization、 QC

抄録

GWASLab is a comprehensive Python toolkit for processing and visualizing summary statistics (SumStats) derived from genome-wide association studies (GWAS). GWASLab provides functions including quality control (QC) of statistics, standardization of chromosome and allele notation, variant normalization, harmonization for meta-analysis, and data visualization. Modular implementation of functions allows users to customize their own pipelines for utilizing SumStats. An expandable formatting library and standalone utilities persistently ensure seamless compatibility with many post-GWAS tools.
Availability and implementation: GWASLab is implemented in Python; the source code is publicly and freely available at https://github.com/Cloufield/gwaslab, and the documentation is available at https://cloufield.github.io/gwaslab/.

利益相反に関する開示

The authors have declared no conflicts of interest.

ダウンロード *前日までの集計結果を表示します

ダウンロード実績データは、公開の翌日以降に作成されます。

引用文献

Genomes Project Consortium et al. (2015) A global reference for human genetic variation. Nature, 526, 68–74.

Bulik-Sullivan,B.K. et al. (2015) LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet, 47, 291–295.

Buniello,A. et al. (2019) The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res, 47, D1005–D1012.

Hayhurst,J. et al. (2022) A community driven GWAS summary statistics standard. 2022.07.15.500230.

Lyon,M.S. et al. (2021) The variant call format provides efficient and robust storage of GWAS summary statistics. Genome Biology, 22, 32.

MacArthur,J.A.L. et al. (2021) Workshop proceedings: GWAS summary statistics standards and sharing. Cell Genomics, 1, 100004.

Malone,J. et al. (2010) Modeling sample variables with an Experimental Factor Ontology. Bioinformatics, 26, 1112–1118.

Matushyn,M. et al. (2022) SumStatsRehab: an efficient algorithm for GWAS summary statistics assessment and restoration. BMC Bioinformatics, 23, 443.

Mbatchou,J. et al. (2020) Computationally efficient whole genome regression for quantitative and binary traits. Nature Genetics.

Murphy,A.E. et al. (2021) MungeSumstats: a Bioconductor package for the standardization and quality control of many GWAS summary statistics. Bioinformatics, 37, 4593–4596.

Purcell,S. et al. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet, 81, 559–575.

Reales,G. and Wallace,C. (2022) Sharing GWAS summary statistics results in more citations: evidence from the GWAS catalog. 2022.09.27.509657.

Sherry,S.T. et al. (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res, 29, 308–311.

Tan,A. et al. (2015) Unified representation of genetic variants. Bioinformatics, 31, 2202–2204.

Turner,S.D. (2018) qqman: an R package for visualizing GWAS results using Q-Q and manhattan plots. Journal of Open Source Software, 3, 731.

Winkler,T.W. et al. (2015) EasyStrata: evaluation and visualization of stratified genome-wide association meta-analysis data. Bioinformatics, 31, 259–261.

Yengo,L. et al. (2022) A saturated map of common genetic variants associated with human height. Nature, 610, 704–712.

Yin,L. et al. (2021) rMVP: A Memory-efficient, Visualization-enhanced, and Parallel-accelerated Tool for Genome-wide Association Study. Genomics Proteomics Bioinformatics, 19, 619–628.

Zhou,W. et al. (2018) Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat Genet, 50, 1335–1341.

Zhou,W. et al. (2022) Global Biobank Meta-analysis Initiative: Powering genetic discovery across human disease. Cell Genom, 2, 100192.

ダウンロード

公開済


投稿日時: 2023-04-28 08:36:45 UTC

公開日時: 2023-05-01 09:59:49 UTC
研究分野
生物学・生命科学・基礎医学