Preprint / Version 1

Probabilistic Over-Representation Analysis for Metabolite Set Enrichment Analysis Considering Undetected Metabolites

##article.authors##

  • Hiroyuki Yamamoto Japan Computational Mass Spectrometry (JCompMS) group

DOI:

https://doi.org/10.51094/jxiv.954

Keywords:

metabolite set enrichment analysis, missing value, metabolomics

Abstract

Over-representation analysis (ORA) is widely used to identify significant pathways in metabolomic data. However, traditional ORA approaches, such as those implemented in MetaboAnalyst, do not account for undetected metabolites, potentially resulting in significant biases since undetected metabolites are automatically classified as non-significant. In this study, we used fasting mouse metabolomic data and developed a novel ORA method that leverages information from detected significant metabolites to estimate the possible range of p-values by considering all possible significance combinations among undetected metabolites. Furthermore, we introduced two probabilistic models—a binomial distribution to estimate the number of significant undetected metabolites and a beta distribution to model their proportion—resulting in narrower p-value ranges. Finally, a hierarchical Bayesian model utilizing shared information across all pathways was applied, with resampling from the specified distributions to calculate ORA p-values and achieve the narrowest p-value confidence intervals. This approach highlights the importance of accounting for undetected metabolites in pathway enrichment analysis to enhance the reliability of ORA results.

Conflicts of Interest Disclosure

I have no conflicts of interest to disclose.

Downloads *Displays the aggregated results up to the previous day.

Download data is not yet available.

References

Chong, J., et al. (2018). MetaboAnalyst 4.0: Towards more transparent and integrative metabolomics analysis. Nucleic Acids Research, 46(W1), W486-W494.

Xia, J., & Wishart, D. S. (2016). Using MetaboAnalyst 3.0 for comprehensive metabolomics data analysis. Current Protocols in Bioinformatics, 55(1), 14-10.

Yamamoto, H., et al. (2014). Statistical hypothesis testing of factor loading in principal component analysis and its application to metabolite set enrichment analysis. BMC Bioinformatics, 15(1), 51.

Posted


Submitted: 2024-11-05 00:41:52 UTC

Published: 2024-11-07 10:07:17 UTC
Section
Information Sciences