Effective Storytelling of Genomic Datasets through Visualization
DOI:
https://doi.org/10.51094/jxiv.678Keywords:
Data Visualization, Genetics, Genomics, DataViz, AI ToolsAbstract
Genomic data are inherently multidimensional and complex, therefore, presenting researcher with significant challenges in analysis and interpretation. Data visualization of genomic datasets can unravel the complexity and provide meaningful insights for effective communication. Here, we discuss that, in data-driven genomic studies, effective storytelling of formulated hypotheses can be significantly enhanced by using suitable data visualization tools. Further, with the ongoing advancement of technology, we argue that, the integration of these tools with artificial intelligence or machine learning concepts could potentially revolutionize the visualization trends within the field of genomic research.
Conflicts of Interest Disclosure
The authors declare no competing interests.Downloads *Displays the aggregated results up to the previous day.
References
Abbasi, S., & Masoumi, S. (2020). Next-generation sequencing (NGS). International Journal of Advanced Science and Technology, 29(3). https://doi.org/10.1007/978-981-16-1037-0_23
Auton, A., Abecasis, G. R., Altshuler, D. M., Durbin, R. M., Bentley, D. R., Chakravarti, A., Clark, A. G., Donnelly, P., Eichler, E. E., Flicek, P., Gabriel, S. B., Gibbs, R. A., Green, E. D., Hurles, M. E., Knoppers, B. M., Korbel, J. O., Lander, E. S., Lee, C., Lehrach, H., … Schloss, J. A. (2015). A global reference for human genetic variation. In Nature (Vol. 526, Issue 7571). https://doi.org/10.1038/nature15393
Brehmer, M., & Munzner, T. (2013). A multi-level typology of abstract visualization tasks. IEEE Transactions on Visualization and Computer Graphics, 19(12). https://doi.org/10.1109/TVCG.2013.124
Cheng, J., Novati, G., Pan, J., Bycroft, C., Žemgulyte, A., Applebaum, T., Pritzel, A., Wong, L. H., Zielinski, M., Sargeant, T., Schneider, R. G., Senior, A. W., Jumper, J., Hassabis, D., Kohli, P., & Avsec, Ž. (2023). Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science, 381(6664). https://doi.org/10.1126/science.adg7492
Cumsille, A., Durán, R. E., Rodríguez-Delherbe, A., Saona-Urmeneta, V., Cámara, B., Seeger, M., Araya, M., Jara, N., & Buil-Aranda, C. (2023). GenoVi, an open-source automated circular genome visualizer for bacteria and archaea. PLoS Computational Biology, 19(4). https://doi.org/10.1371/JOURNAL.PCBI.1010998
Durant, E., Rouard, M., Ganko, E. W., Muller, C., Cleary, A. M., Farmer, A. D., Conte, M., & Sabot, F. (2022a). Ten simple rules for developing visualization tools in genomics. PLoS Computational Biology, 18(11). https://doi.org/10.1371/journal.pcbi.1010622
Durant, E., Rouard, M., Ganko, E. W., Muller, C., Cleary, A. M., Farmer, A. D., Conte, M., & Sabot, F. (2022b). Ten simple rules for developing visualization tools in genomics. PLOS Computational Biology, 18(11), e1010622. https://doi.org/10.1371/journal.pcbi.1010622
Genereux, D. P., Serres, A., Armstrong, J., Johnson, J., Marinescu, V. D., Murén, E., Juan, D., Bejerano, G., Casewell, N. R., Chemnick, L. G., Damas, J., Di Palma, F., Diekhans, M., Fiddes, I. T., Garber, M., Gladyshev, V. N., Goodman, L., Haerty, W., Houck, M. L., … Karlsson, E. K. (2020). A comparative genomics multitool for scientific discovery and conservation. Nature, 587(7833). https://doi.org/10.1038/s41586-020-2876-6
Goel, M., & Schneeberger, K. (2022). plotsr: visualizing structural similarities and rearrangements between multiple genomes. Bioinformatics, 38(10), 2922–2926. https://doi.org/10.1093/bioinformatics/btac196
Goodstadt, M., & Marti-Renom, M. A. (2017). Challenges for visualizing three-dimensional data in genomic browsers. In FEBS Letters (Vol. 591, Issue 17). https://doi.org/10.1002/1873-3468.12778
Guo, K., Wu, M., Soo, Z., Yang, Y., Zhang, Y., Zhang, Q., Lin, H., Grosser, M., Venter, D., Zhang, G., & Lu, J. (2023). Artificial intelligence-driven biomedical genomics. Knowledge-Based Systems, 279, 110937. https://doi.org/10.1016/j.knosys.2023.110937
Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, K., Bates, R., Žídek, A., Potapenko, A., Bridgland, A., Meyer, C., Kohl, S. A. A., Ballard, A. J., Cowie, A., Romera-Paredes, B., Nikolov, S., Jain, R., Adler, J., … Hassabis, D. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873). https://doi.org/10.1038/s41586-021-03819-2
Krzywinski, M., Schein, J., Birol, I., Connors, J., Gascoyne, R., Horsman, D., Jones, S. J., & Marra, M. A. (2009). Circos: An information aesthetic for comparative genomics. Genome Research, 19(9). https://doi.org/10.1101/gr.092759.109
Langer, C. C. H., Mitter, M., Stocsits, R. R., & Gerlich, D. W. (2023). HiCognition: a visual exploration and hypothesis testing tool for 3D genomics. Genome Biology, 24(1). https://doi.org/10.1186/s13059-023-02996-9
Li, F., Hu, H., Xiao, Z., Wang, J., Liu, J., Zhao, D., Fu, Y., Wang, Y., Yuan, X., Bu, S., Zhou, X., Zhao, J., & Wang, S. (2023). Visualization and review of reads alignment on the graphical pan-genome with VAG. BioRxiv. https://doi.org/10.1101/2023.01.20.524849
Li, W., Zhang, S., Liu, C. C., & Zhou, X. J. (2012). Identifying multi-layer gene regulatory modules from multi-dimensional genomic data. Bioinformatics, 28(19). https://doi.org/10.1093/bioinformatics/bts476
Liu, Q., Chen, J., Wang, Y., Li, S., Jia, C., Song, J., & Li, F. (2021). DeepTorrent: A deep learning-based approach for predicting DNA N4-methylcytosine sites. Briefings in Bioinformatics, 22(3). https://doi.org/10.1093/bib/bbaa124
L’Yi, S., Wang, Q., Lekschas, F., & Gehlenborg, N. (2022). Gosling: A Grammar-based Toolkit for Scalable and Interactive Genomics Data Visualization. IEEE Transactions on Visualization and Computer Graphics, 28(1). https://doi.org/10.1109/TVCG.2021.3114876
Malinda, R. R. (2023). Biological data studies, scale-up the potential with machine learning. In European Journal of Human Genetics (Vol. 31, Issue 6). https://doi.org/10.1038/s41431-023-01361-5
Meyer, M., Sedlmair, M., & Munzner, T. (2012). The four-level nested model revisited: Blocks and guidelines. ACM International Conference Proceeding Series. https://doi.org/10.1145/2442576.2442587
Munzner, T. (2014). Visualization Analysis & Design. In Visualization Analysis and Design. https://doi.org/10.1201/b17511
Munzner T. (2014). Visualization analysis and design. CRC Press.
Nielsen, C. B., Cantor, M., Dubchak, I., Gordon, D., & Wang, T. (2010). Visualizing genomes: Techniques and challenges. In Nature Methods (Vol. 7, Issue 3). https://doi.org/10.1038/nmeth.1422
Nurk, S., Koren, S., Rhie, A., Rautiainen, M., Bzikadze, A. V., Mikheenko, A., Vollger, M. R., Altemose, N., Uralsky, L., Gershman, A., Aganezov, S., Hoyt, S. J., Diekhans, M., Logsdon, G. A., Alonge, M., Antonarakis, S. E., Borchers, M., Bouffard, G. G., Brooks, S. Y., … Phillippy, A. M. (2022). The complete sequence of a human genome. Science, 376(6588). https://doi.org/10.1126/science.abj6987
Nusrat, S., Harbig, T., & Gehlenborg, N. (2019). Tasks, techniques, and tools for genomic data visualization. Computer Graphics Forum, 38(3). https://doi.org/10.1111/cgf.13727
O’Donoghue, S. I. (2021). Grand Challenges in Bioinformatics Data Visualization. Frontiers in Bioinformatics, 1. https://doi.org/10.3389/fbinf.2021.669186
Parsons, P. (2022). Understanding Data Visualization Design Practice. IEEE Transactions on Visualization and Computer Graphics, 28(1). https://doi.org/10.1109/TVCG.2021.3114959
Pearce, T. M., Nikiforova, M. N., & Roy, S. (2019). Interactive Browser-Based Genomics Data Visualization Tools for Translational and Clinical Laboratory Applications. Journal of Molecular Diagnostics, 21(6). https://doi.org/10.1016/j.jmoldx.2019.06.005
Qu, Z., Lau, C. W., Nguyen, Q. V., Zhou, Y., & Catchpoole, D. R. (2019). Visual Analytics of Genomic and Cancer Data: A Systematic Review. In Cancer Informatics (Vol. 18). https://doi.org/10.1177/1176935119835546
Rhie, A., Nurk, S., Cechova, M., Hoyt, S. J., Taylor, D. J., Altemose, N., Hook, P. W., Koren, S., Rautiainen, M., Alexandrov, I. A., Allen, J., Asri, M., Bzikadze, A. V., Chen, N. C., Chin, C. S., Diekhans, M., Flicek, P., Formenti, G., Fungtammasan, A., … Phillippy, A. M. (2023). The complete sequence of a human Y chromosome. Nature, 621(7978). https://doi.org/10.1038/s41586-023-06457-y
Singh, R., Lanchantin, J., Robins, G., & Qi, Y. (2016). DeepChrome: Deep-learning for predicting gene expression from histone modifications. Bioinformatics, 32(17). https://doi.org/10.1093/bioinformatics/btw427
Tanjo, T., Kawai, Y., Tokunaga, K., Ogasawara, O., & Nagasaki, M. (2021). Practical guide for managing large-scale human genome data in research. Journal of Human Genetics, 66(1), 39–52. https://doi.org/10.1038/s10038-020-00862-1
Tumescheit, C., Firth, A. E., & Brown, K. (2022). CIAlign: A highly customisable command line tool to clean, interpret and visualise multiple sequence alignments. PeerJ. https://doi.org/10.7717/peerj.12983
Uffelmann, E., Huang, Q. Q., Munung, N. S., de Vries, J., Okada, Y., Martin, A. R., Martin, H. C., Lappalainen, T., & Posthuma, D. (2021). Genome-wide association studies. In Nature Reviews Methods Primers (Vol. 1, Issue 1). Springer Nature. https://doi.org/10.1038/s43586-021-00056-9
Wang, X., Wu, Z., Huang, W., Wei, Y., Huang, Z., Xu, M., & Chen, W. (2023). VIS+AI: integrating visualization with artificial intelligence for efficient data analysis. Frontiers of Computer Science, 17(6). https://doi.org/10.1007/s11704-023-2691-y
Wang, Y., Song, F., Zhang, B., Zhang, L., Xu, J., Kuang, D., Li, D., Choudhary, M. N. K., Li, Y., Hu, M., Hardison, R., Wang, T., & Yue, F. (2018). The 3D Genome Browser: A web-based browser for visualizing 3D genome organization and long-range chromatin interactions. Genome Biology, 19(1). https://doi.org/10.1186/s13059-018-1519-9
Wojcik, G. L., Graff, M., Nishimura, K. K., Tao, R., Haessler, J., Gignoux, C. R., Highland, H. M., Patel, Y. M., Sorokin, E. P., Avery, C. L., Belbin, G. M., Bien, S. A., Cheng, I., Cullina, S., Hodonsky, C. J., Hu, Y., Huckins, L. M., Jeff, J., Justice, A. E., … Carlson, C. S. (2019). Genetic analyses of diverse populations improves discovery for complex traits. Nature, 570(7762). https://doi.org/10.1038/s41586-019-1310-4
Wong, B. (2012). Visualizing biological data. Nature Methods, 9(12). https://doi.org/10.1038/nmeth.2258
Wong, K. C. (2019). Big data challenges in genome informatics. In Biophysical Reviews (Vol. 11, Issue 1). https://doi.org/10.1007/s12551-018-0493-5
Xu, W., Zhong, Q., Lin, D., Zuo, Y., Dai, J., Li, G., & Cao, G. (2021). CoolBox: a flexible toolkit for visual analysis of genomics data. BMC Bioinformatics, 22(1). https://doi.org/10.1186/s12859-021-04408-w
Yokoyama, T. T., & Kasahara, M. (2020). Visualization tools for human structural variations identified by whole-genome sequencing. In Journal of Human Genetics (Vol. 65, Issue 1). https://doi.org/10.1038/s10038-019-0687-0
Yuan, Y., Shi, Y., Li, C., Kim, J., Cai, W., Han, Z., & Feng, D. D. (2016). Deepgene: An advanced cancer type classifier based on deep learning and somatic point mutations. BMC Bioinformatics, 17. https://doi.org/10.1186/s12859-016-1334-9
Zhou, L., Feng, T., Xu, S., Gao, F., Lam, T. T., Wang, Q., Wu, T., Huang, H., Zhan, L., Li, L., Guan, Y., Dai, Z., & Yu, G. (2022). Ggmsa: A visual exploration tool for multiple sequence alignment and associated data. Briefings in Bioinformatics, 23(4). https://doi.org/10.1093/bib/bbac222
Downloads
Posted
Submitted: 2024-04-23 23:18:29 UTC
Published: 2024-04-26 01:26:16 UTC
License
Copyright (c) 2024
Raj Rajeshwar Malinda
Dipika Mishra
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.