Integrating Machine Learning Standards in Disseminating Machine Learning Research

Scott C. Edmunds; Nicole Nogoy; Qing Lan; Hongfang Zhang; Yannan Fan; Hongling Zhou; Chris Armit

doi:10.5334/dsj-2026-001

Abstract

The increasing use of AI-based approaches such as machine learning (ML) across diverse scientific fields presents challenges for reproducibly disseminating and assessing research. As ML becomes integral to a growing range of computationally intensive applications (e.g., clinical research), there is a critical need for transparent reporting methods to ensure both comprehensibility and the reproducibility of the supporting studies. There are a growing number of standards, checklists, and guidelines enabling more standardised reporting of ML research, but the proliferation and complexity of these make them challenging to use—particularly in assessment and peer review, which has, to date, been an ad hoc process that has struggled to throw light on increasingly complicated computational supporting methods that are otherwise unintelligible to other researchers. Taking the publication process beyond these black boxes, GigaScience Press has experimented with integrating many of these ML standards into the publication process. Having a broad scope necessitated looking at more generalist and automated approaches. Here, we map the current landscape of artificial intelligence (AI) standards and outline our adoption of the Data, Optimization, Model, Evaluation (DOME) recommendations for ML in biology. We developed a publishing workflow that integrates the DOME Data Stewardship Wizard (DOME-DSW) and DOME Registry tools into the peer review and publication process. From this publisher’s case study, we provide journal authors, reviewers, and editors with examples of approaches, workflows, and strategies to more logically disseminate and review ML research. This demonstrates the need for continued dialogue and collaboration among various ML communities to create unified, comprehensive standards and to enhance the credibility, sustainability, and impact of ML-based scientific research.

References

Akhtar, M., Benjelloun, O., Conforti, C., Foschini, L. et al. (2024) ‘Croissant: A metadata format for ML-ready datasets’, Advances in Neural Information Processing Systems, 37, pp. 82133–82148. Available at: 10.52202/079017-2610
Open DOI Search in Google Scholar Back to article
Armit, C., Tuli, M.A. and Hunter, C.I. (2022) ‘A decade of GigaScience: GigaDB and the Open Data movement’, Gigascience, 11, p. giac053. Available at: 10.1093/gigascience/giac053
Open DOI Search in Google Scholar Back to article
Artrith, N., Butler, K.T., Coudert, F.X. et al. (2021) ‘Best practices in machine learning for chemistry’, Nature Chemistry, 13, pp. 505–508. Available at: 10.1038/s41557-021-00716-z
Open DOI Search in Google Scholar Back to article
Atkins, K., Garzón-Martínez, G.A., Lloyd, A., Doonan, J.H. and Lu, C. (2025) ‘Unlocking the power of AI for phenotyping fruit morphology in Arabidopsis’, Gigascience, 14, p. giae123. Available at: 10.1093/gigascience/giae123
Open DOI Search in Google Scholar Back to article
Attafi, O.A., Clementel, D., Kyritsis, K., Capriotti, E. et al. (2024) ‘DOME Registry: Implementing community-wide recommendations for reporting supervised machine learning in biology’, GigaScience, 13, p. giae094. Available at: 10.1093/gigascience/giae094
Open DOI Search in Google Scholar Back to article
Clark, T., Caufield, H., Parker, J.A., Al Manir, S. et al. (2024) ‘AI-readiness for biomedical data: Bridge2AI recommendations’, bioRxiv [Preprint], 2024.10.23.619844. Available at: 10.1101/2024.10.23.619844
Open DOI Search in Google Scholar Back to article
Collins, G.S., Moons, K.G.M., Dhiman, P., Riley, R.D. et al. (2024) ‘TRIPOD+AI statement: Updated guidance for reporting clinical prediction models that use regression or machine learning methods’, BMJ, 385, p. e078378. Available at: 10.1136/bmj-2023-078378
Open DOI Search in Google Scholar Back to article
Collins, G.S., Reitsma, J.B., Altman, D.G. and Moons, K.G. (2015) ‘Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): The TRIPOD statement’, BMJ, 350, p. g7594. Available at: 10.1136/bmj.g7594
Open DOI Search in Google Scholar Back to article
Data Citation Synthesis Group. (2014) ‘Joint Declaration of Data Citation Principles’, in M. Martone (ed.) San Diego CA: FORCE11. 10.25490/a97f-egyk
Open DOI Search in Google Scholar Back to article
Farrell, G., Adamidi, E., Buono, R.A., Anton, M. et al. (2025) ‘Open and sustainable AI: Challenges, opportunities and the road ahead in the life sciences’, arXiv, 2505.16619. Available at: 10.48550/arXiv.2505.16619
Open DOI Search in Google Scholar Back to article
Gallifant, J., Afshar, M., Ameen, S., Aphinyanaphongs, Y., Chen, S., Cacciamani, G., Demner-Fushman, D., Dligach, D., Daneshjou, R., Fernandes, C., Hansen, L.H., Landman, A., Lehmann, L., McCoy, L.G., Miller, T., Moreno, A., Munch, N., Restrepo, D., Savova, G., Umeton, R., Gichoya, J.W., Collins, G.S., Moons, K.G.M., Celi, L.A. and Bitterman, D.S. (2025) ‘The TRIPOD-LLM reporting guideline for studies using large language models’, Nature Medicine, 31(1), pp. 60–69. Available at: 10.1038/s41591-024-03425-5
Open DOI Search in Google Scholar Back to article
González-Beltrán, A., Li, P., Zhao, J., Avila-Garcia, M.S. et al. (2015) ‘From peer-reviewed to peer-reproduced in scholarly publishing: The complementary roles of data models and workflows in bioinformatics’, PLoS One, 10(7), p. e0127612. Available at: 10.1371/journal.pone.0127612
Open DOI Search in Google Scholar Back to article
Guo, A., Chen, Z., Li, F. and Luo, Q. (2023) ‘Supporting data for “delineating Regions-of-Interest for mass spectrometry imaging by multimodally corroborated spatial segmentation”’, GigaScience Database. Available at: 10.5524/102374
Open DOI Search in Google Scholar Back to article
Guo, D., Yang, D., Zhang, H., Song, J. et al. (2025) ‘DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning’, Nature, 645(8081), pp. 633–638. Available at: 10.1038/s41586-025-09422-z
Open DOI Search in Google Scholar Back to article
Haller, S., Van Cauter, S., Federau, C., Hedderich, D.M. and Edjlali, M. (2022) ‘The R-AI-DIOLOGY checklist: A practical checklist for evaluation of artificial intelligence tools in clinical neuroradiology’, Neuroradiology, 64(5), pp. 851–864. Available at: 10.1007/s00234-021-02890-w
Open DOI Search in Google Scholar Back to article
Hatos, A., Quaglia, F., Piovesan, D. and Tosatto S.C.E. (2021) ‘APICURON: A database to credit and acknowledge the work of biocurators’, Database (Oxford), 2021, p. baab019. Available at: 10.1093/database/baab019
Open DOI Search in Google Scholar Back to article
Heil, B.J., Hoffman, M.M., Markowetz, F., Lee, S.I., Greene, C.S. and Hicks, S.C. (2021) ‘Reproducibility standards for machine learning in the life sciences’, Nature Methods, 18(10), pp. 1132–1135. Available at: 10.1038/s41592-021-01256-7
Open DOI Search in Google Scholar Back to article
Hernandez-Boussard, T., Bozkurt, S., Ioannidis, J.P.A. and Shah, N.H. (2020) ‘MINIMAR (MINimum Information for Medical AI Reporting): Developing reporting standards for artificial intelligence in health care’, Journal of the American Medical Informatics Association, 27(12), pp. 2011–2015. Available at: 10.1093/jamia/ocaa088
Open DOI Search in Google Scholar Back to article
Hugging Face. Dataset Cards. https://huggingface.co/docs/hub/en/datasets-cards.
Search in Google Scholar Back to article
Hugging Face. Model Cards. https://huggingface.co/docs/hub/en/model-cards.
Search in Google Scholar Back to article
Jones, D.T. (2019) ‘Setting the standards for machine learning in biology’, Nature Reviews Molecular Cell Biology, 20, pp. 659–660. Available at: 10.1038/s41580-019-0176-5
Open DOI Search in Google Scholar Back to article
Kakarmath, S., Esteva, A., Arnaout, R., Harvey, H., Kumar, S., Muse, E., Dong, F., Wedlund, L. and Kvedar, J. (2020) ‘Best practices for authors of healthcare-related artificial intelligence manuscripts’, NPJ Digital Medicine, 3(134). Available at: 10.1038/s41746-020-00336-w
Open DOI Search in Google Scholar Back to article
Kapoor, S., Cantrell, E.M., Peng, K., Pham, T.H., Bail, C.A., Gundersen, O.E., Hofman, J.M., Hullman, J., Lones, M.A., Malik, M.M., Nanayakkara, P., Poldrack, R.A., Raji, I.D., Roberts, M., Salganik, M.J., Serra-Garcia, M., Stewart, B.M., Vandewiele, G. and Narayanan, A. (2024) ‘REFORMS: Consensus-based recommendations for machine-learning-based science’, Science Advances, 10(18), p. eadk3452. Available at: 10.1126/sciadv.adk3452
Open DOI Search in Google Scholar Back to article
Kolbinger, F.R., Veldhuizen, G.P., Zhu, J., Truhn, D. and Kather, J.N. (2024) ‘Reporting guidelines in medical artificial intelligence: A systematic review and meta-analysis’, Communications Medicine (Lond), 4(1), p. 71. Available at: 10.1038/s43856-024-00492-0
Open DOI Search in Google Scholar Back to article
Lenharo, M. (2024) ‘The testing of AI in medicine is a mess. Here’s how it should be done’, Nature, 632, pp. 722–724. Available at: 10.1038/d41586-024-02675-0
Open DOI Search in Google Scholar Back to article
Liu, X., Rivera, S.C., Moher, D., Calvert, M.J., Denniston, A.K., SPIRIT-AI and CONSORT-AI Working Group. (2020) ‘Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: The CONSORT-AI extension’, BMJ, 370, p. m3164. Available at: 10.1136/bmj.m3164
Open DOI Search in Google Scholar Back to article
Luo, W., Phung, D., Tran, T., Gupta, S., Rana, S., Karmakar, C., Shilton, A., Yearwood, J., Dimitrova, N., Ho, T.B., Venkatesh, S. and Berk, M. (2016) ‘Guidelines for developing and reporting machine learning predictive models in biomedical research: A multidisciplinary view’, Journal of Medical Internet Research, 18(12), e323. Available at: 10.2196/jmir.5870
Open DOI Search in Google Scholar Back to article
Matschinske, J., Alcaraz, N., Benis, A. et al. (2021) ‘The AIMe registry for artificial intelligence in biomedical research’, Nat Methods, 18, pp. 1128–1131. Available at: 10.1038/s41592-021-01241-0
Open DOI Search in Google Scholar Back to article
Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L., Hutchinson, B., Spitzer, E., Raji, I.D. and Gebru, T. (2019) ‘Model Cards for Model Reporting’, Proceedings of the Conference on Fairness, Accountability, and Transparency (FAT* ‘19). Association for Computing Machinery, New York, NY, USA, 220–229. 10.1145/3287560.3287596
Open DOI Search in Google Scholar Back to article
Mongan, J., Moy, L. and Kahn, C.E. Jr. (2020) ‘Checklist for Artificial Intelligence in Medical Imaging (CLAIM): A guide for authors and reviewers’, Radiology: Artificial Intelligence, 2(2), p. e200029. Available at: 10.1148/ryai.2020200029
Open DOI Search in Google Scholar Back to article
Nature Methods. (2021) ‘Keeping checks on machine learning’, Nature Methods, 18(10), p. 1119. Available at: 10.1038/s41592-021-01300-6
Open DOI Search in Google Scholar Back to article
Norgeot, B., Quer, G., Beaulieu-Jones, BK., Torkamani, A., Dias, R., Gianfrancesco, M., Arnaout, R., Kohane, I.S., Saria, S., Topol, E., Obermeyer, Z., Yu, B. and Butte, A.J. (2020) ‘Minimum information about clinical artificial intelligence modeling: The MI-CLAIM checklist’, Nature Medicine, 26(9), pp. 1320–1324. Available at: 10.1038/s41591-020-1041-y
Open DOI Search in Google Scholar Back to article
Palmblad, M., Böcker, S., Degroeve, S., Kohlbacher, O., Käll, L., Noble, W.S. and Wilhelm, M. (2022) ‘Interpretation of the DOME recommendations for machine learning in proteomics and metabolomics’, Journal of Proteome Research, 21(4), pp. 1204–1207. Available at: 10.1021/acs.jproteome.1c00900
Open DOI Search in Google Scholar Back to article
Pergl, R., Hooft, R., Suchánek, M., Knaisl, V. and Slifka, J. (2019) ‘Data Stewardship Wizard: A tool bringing together researchers, data stewards, and data experts around data management planning’, Data Science Journal, 18. Available at: 10.5334/dsj-2019-059
Open DOI Search in Google Scholar Back to article
Pineau, J., Vincent-Lamarre, P., Sinha, K., Lariviere, V., Beygelzimer, A., d’Alche-Buc, F., Fox, E. and Larochelle, H. (2022) ‘Improving reproducibility in machine learning research (a report from the neuriPS 2019 reproducibility program)’, Journal of Machine Learning Research, 22, pp. 7459–7478.
Search in Google Scholar Back to article
Quilez, J., Vidal, E., Dily, F.L., Serra, F., Cuartero, Y., Stadhouders, R., Graf, T., Marti-Renom, M.A., Beato, M. and Filion, G. (2017) ‘Parallel sequencing lives, or what makes large sequencing projects successful’, GigaScience, 6(11), pp. 1–6. Available at: 10.1093/gigascience/gix100
Open DOI Search in Google Scholar Back to article
Rivera, S.C., Liu, X., Chan, A.W., Denniston, A.K., Calvert, M.J., SPIRIT-AI and CONSORT-AI Working Group. (2020) ‘Guidelines for clinical trial protocols for interventions involving artificial intelligence: The SPIRIT-AI extension’, BMJ, 370, p. m3210. Available at: 10.1136/bmj.m3210
Open DOI Search in Google Scholar Back to article
Schmied, C., Nelson, M.S., Avilov, S., Bakker, G.J. et al. (2024) ‘Community-developed checklists for publishing images and image analyses’, Nature Methods, 21(2), pp. 170–181. Available at: 10.1038/s41592-023-01987-9
Open DOI Search in Google Scholar Back to article
Sengupta, P.P., Shrestha, S., Berthon, B., Messas, E., Donal, E., Tison, G.H., Min, J.K., D’hooge, J., Voigt, J.U., Dudley, J., Verjans, J.W., Shameer, K., Johnson, K., Lovstakken, L., Tabassian, M., Piccirilli, M., Pernot, M., Yanamala, N., Duchateau, N., Kagiyama, N., Bernard, O., Slomka, P., Deo, R. and Arnaout, R. (2020) ‘Proposed Requirements for Cardiovascular Imaging-Related Machine Learning Evaluation (PRIME): A checklist: Reviewed by the American College of Cardiology Healthcare Innovation Council’, JACC: Cardiovascular Imaging, 13(9), pp. 2017–2035. Available at: 10.1016/j.jcmg.2020.07.015
Open DOI Search in Google Scholar Back to article
Vasey, B., Nagendran, M., Campbell, B., Clifton, D.A., Collins, G.S., Denaxas, S., Denniston, A.K., Faes, L., Geerts, B., Ibrahim, M., Liu, X., Mateen, B.A., Mathur, P., McCraddenm, M.D., Morgan, L., Ordish, J., Rogers, C., Saria, S., Ting, D.S.W., Watkinson, P., Weber, W., Wheatstone, P., McCulloch, P. and DECIDE-AI expert group. (2022) ‘Reporting guideline for the early stage clinical evaluation of decision support systems driven by artificial intelligence: DECIDE-AI’, BMJ, 377, p. e070904. Available at: 10.1136/bmj-2022-070904
Open DOI Search in Google Scholar Back to article
Walsh, I., Fishman, D., Garcia-Gasulla, D. et al. (2021) ‘DOME: Recommendations for supervised machine learning validation in biology’, Nature Methods, 18, pp. 1122–1127. Available at: 10.1038/s41592-021-01205-4
Open DOI Search in Google Scholar Back to article
Wossnig, L., Furtmann, N., Buchanan, A., Kumar, S. and Greiff, V. (2024) ‘Best practices for machine learning in antibody discovery and development’, Drug Discovery Today, 29(7), p. 104025. Available at: 10.1016/j.drudis.2024.104025
Open DOI Search in Google Scholar Back to article

Integrating Machine Learning Standards in Disseminating Machine Learning Research

Abstract

Paradigm

My account