Preprint / Version 1

GWASLab: a Python package for processing and visualizing GWAS summary statistics

##article.authors##

  • Yunye He Laboratory of Complex Trait Genomics, Graduate School of Frontier Sciences, The University of Tokyo https://orcid.org/0000-0001-8581-7826
  • Masaru Koido Laboratory of Complex Trait Genomics, Graduate School of Frontier Sciences, The University of Tokyo https://orcid.org/0000-0002-0348-0666 https://researchmap.jp/mkoido
  • Yuka Shimmori Laboratory of Complex Trait Genomics, Graduate School of Frontier Sciences, The University of Tokyo
  • Yoichiro Kamatani Laboratory of Complex Trait Genomics, Graduate School of Frontier Sciences, The University of Tokyo

DOI:

https://doi.org/10.51094/jxiv.370

Keywords:

GWAS, Python, summary statistics, visualization, QC

Abstract

GWASLab is a comprehensive Python toolkit for processing and visualizing summary statistics (SumStats) derived from genome-wide association studies (GWAS). GWASLab provides functions including quality control (QC) of statistics, standardization of chromosome and allele notation, variant normalization, harmonization for meta-analysis, and data visualization. Modular implementation of functions allows users to customize their own pipelines for utilizing SumStats. An expandable formatting library and standalone utilities persistently ensure seamless compatibility with many post-GWAS tools.
Availability and implementation: GWASLab is implemented in Python; the source code is publicly and freely available at https://github.com/Cloufield/gwaslab, and the documentation is available at https://cloufield.github.io/gwaslab/.

Conflicts of Interest Disclosure

The authors have declared no conflicts of interest.

Downloads *Displays the aggregated results up to the previous day.

Download data is not yet available.

References

Genomes Project Consortium et al. (2015) A global reference for human genetic variation. Nature, 526, 68–74.

Bulik-Sullivan,B.K. et al. (2015) LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet, 47, 291–295.

Buniello,A. et al. (2019) The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res, 47, D1005–D1012.

Hayhurst,J. et al. (2022) A community driven GWAS summary statistics standard. 2022.07.15.500230.

Lyon,M.S. et al. (2021) The variant call format provides efficient and robust storage of GWAS summary statistics. Genome Biology, 22, 32.

MacArthur,J.A.L. et al. (2021) Workshop proceedings: GWAS summary statistics standards and sharing. Cell Genomics, 1, 100004.

Malone,J. et al. (2010) Modeling sample variables with an Experimental Factor Ontology. Bioinformatics, 26, 1112–1118.

Matushyn,M. et al. (2022) SumStatsRehab: an efficient algorithm for GWAS summary statistics assessment and restoration. BMC Bioinformatics, 23, 443.

Mbatchou,J. et al. (2020) Computationally efficient whole genome regression for quantitative and binary traits. Nature Genetics.

Murphy,A.E. et al. (2021) MungeSumstats: a Bioconductor package for the standardization and quality control of many GWAS summary statistics. Bioinformatics, 37, 4593–4596.

Purcell,S. et al. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet, 81, 559–575.

Reales,G. and Wallace,C. (2022) Sharing GWAS summary statistics results in more citations: evidence from the GWAS catalog. 2022.09.27.509657.

Sherry,S.T. et al. (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res, 29, 308–311.

Tan,A. et al. (2015) Unified representation of genetic variants. Bioinformatics, 31, 2202–2204.

Turner,S.D. (2018) qqman: an R package for visualizing GWAS results using Q-Q and manhattan plots. Journal of Open Source Software, 3, 731.

Winkler,T.W. et al. (2015) EasyStrata: evaluation and visualization of stratified genome-wide association meta-analysis data. Bioinformatics, 31, 259–261.

Yengo,L. et al. (2022) A saturated map of common genetic variants associated with human height. Nature, 610, 704–712.

Yin,L. et al. (2021) rMVP: A Memory-efficient, Visualization-enhanced, and Parallel-accelerated Tool for Genome-wide Association Study. Genomics Proteomics Bioinformatics, 19, 619–628.

Zhou,W. et al. (2018) Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat Genet, 50, 1335–1341.

Zhou,W. et al. (2022) Global Biobank Meta-analysis Initiative: Powering genetic discovery across human disease. Cell Genom, 2, 100192.

Downloads

Posted


Submitted: 2023-04-28 08:36:45 UTC

Published: 2023-05-01 09:59:49 UTC
Section
Biology, Life Sciences & Basic Medicine