Have a personal or library account? Click to login
A Lightweight File System Based Approach to Getting Data Ready for Data Management Solutions Cover

A Lightweight File System Based Approach to Getting Data Ready for Data Management Solutions

Open Access
|Apr 2025

References

  1. 1Astropy Collaboration, Price-Whelan, A.M., Lim, P.L., Earl, N., Starkman, N., Bradley, L., Shupe, D.L., Patil, A.A., Corrales, L., Brasseur, C.E., Nöthe, M., Donath, A., Tollerud, El, Morris B.M., Ginsburg A., Vaher, E., Weaver, B.A., Tocknell, J., Jamieson, W., van Kerwijk, M.H., Robitaille, T.P., Merry, B., Bachetti, M., Günther, M., Authors, Paper, Aldcroft, T.L., Alvarado-Montes, J.A., Archibald, A.M., Bódi, A., BVapat, S., Barensten, G., Bazán, J., Biswas, M., Boquien, M., Burke, D.J., Cara, D., Cara, M., Conroy, K.E., Conseil, S., Craig, M.W., Cross, R.M., Cruz, K.L., D’Eugenio, F., Dencheva, N., Devillepoix H.A.R., Dietrich, J.P., Eigenbrot, A.D., Erben, T., Ferreira, L., Foreman-Mackey, D., Fox, R., Freij, B., Garg, S., Geda, R., Glattly, L., Gondhalekar, Y., Gordon, K.D., Grant, D., Greenfield, P., Groener, A.M., Guest, S., Gurovich, S., Handberg, R., Hart, A., Hatfield-Dodds, Z., Homeier, D., Hosseinzadeh, G., Jenness, T., Jones, C.K., Joseph, P., Kalmback, J.B., Karamehmetoglu, E., Kałuszyński, M., Kelley, M.S.P., Kern, N., Kerzendorf, W.E., Koch, E.W., Kulumani, S., Lee, A., Ly, C., Ma, Z., MacBride, C., Maljaars, J.M., Muna, D., Murphy, N.A., Norman, H., O’Steen, R., Oman, K.A., Pacifici, C., Pascual, S., Pascual-Granado, J., Patil, R.R., Perren, G.I., Pickering, T.E., Rastogi, T., Roulston, B.R., Ryan, D.F., Rykoff, E.S., Sabater, J., Sakurikar, P., Salgado, J., Sanghi, A., Saunders, N., Savchenko, V., Schwardt, L., Seifert-Eckert, M., Shih, A.Y., Jain A.S., Shukla, G., Sick, J., Simpson, C., Singanamalla, S., Singer, L.P., Singhal, J., Sinha, M., Sipőcz, B.M., Spitler, L.R., Stansby, D., Streicher, O., Šumak, J., Swinbank, J.D., Taranu, D.S., Tewary, N., Tremblay, G.R., de Val-Borro, M., Van Kooten, S.J., Vasović, Z., Verma, S., Vinícius de Miranda Cardoso, J., Williams, P.K.G., Wilson, T.J., Winkel, B., Wood-Vasey, W.M., Xue, R., Yoachim, P., Zhang, C. and Zonca, A. (2022) ‘The Astropy Project: Sustaining and growing a community-oriented open-source project and the latest major release (v5.0) of the core package’, The Astrophysical Journal, 935, pp. 167. Available at: 10.3847/1538-4357/ac7c74
  2. 2Bahim, C., Casorrán-Amilburu, C., Dekkers, M., Herczog, E., Loozen, N., Repanas, K., Russell, K. and Stall, S. (2020) ‘The FAIR data maturity model: An approach to harmonise FAIR assessments’, Data Science Journal, 19, pp. 4141. Available at: 10.5334/dsj-2020-041
  3. 3Bard, A.J., Inzelt, G. and Scholz, F. (2012) Electrochemical dictionary. Heidelberg: Springer Berlin. Available at: 10.1007/978-3-642-29551-5
  4. 4Ben-Kiki, O., Evans, C. and döt Net, I. (2001) YAML Ain’t Markup Language (YAML™) Version 1.1. Available at: https://yaml.org/spec/1.1/.
  5. 5Blumberg, K.L., Ponsero, A.J., Bomhoff, M., Wood-Charlson, E.M., DeLong, E.F. and Hurwitz, B.L. (2021) ‘Ontology-enriched specifications enabling findable, accessible, interoperable, and reusable marine metagenomic datasets in cyberinfrastructure systems’, Frontiers in Microbiology, 12, pp. 765268. Available at: 10.3389/fmicb.2021.765268
  6. 6Clark, S., Bleken, F.L., Stier, S., Flores, E., Andersen, C.W., Marcinek, M., Szczesna-Chrzan, A., Gaberscek, M., Palacin, M.R., Uhrin, M. and Friis, J. (2022) ‘Toward a unified description of battery data’, Advanced Energy Materials, 12(17), pp. 2102702. Available at: 10.1002/aenm.202102702
  7. 7Collette, A., Kluyver, T., Caswell, T.A., Tocknell, J., Kieffer, J., Scopatz, A., Dale, D., Chen, Jelenak, A., payno, juliagarriga, Vincent, T., Sciarelli, P., Valls, V., Ghosh, S., Pedersen, U.K., jakirkham, Raspaud, M., Parsons, A., Abbasi, H., Solé, V.A., jialin, Danilevski, C., Feng, Y., Vaillant, G.A., Teichmann, M. and Brucher, M. (2021) h5py/h5py: 3.2.1. Available at: 10.5281/zenodo.4584676
  8. 8Crockford, D. (2006) The application/json Media Type for JavaScript Object Notation (JSON). Available at: https://www.rfc-editor.org/rfc/rfc4627. 10.17487/rfc4627
  9. 9Edwards, P.N., Mayernik, M.S., Batcheller, A.L., Bowker, G.C. and Borgman, C.L. (2011) ‘Science friction: Data, metadata, and collaboration’, Social Studies of Science, 41(5), pp. 667690. Available at: 10.1177/0306312711413314
  10. 10Engstfeld, A. (2024) DunklesArchipel/ORR_on_Ru0001: 0.2.1. Available at: 10.5281/zenodo.10979477
  11. 11Engstfeld, A.K., Beckord, S., Fuchs, S. and Behm, R.J. (2024) ‘Impact of the potential dependent surface adlayer composition on the ORR activity and H formation on Ru(0001) in acid electrolytes’, ChemCatChem, 16(16), e202400271. Available at: 10.1002/cctc.202400271
  12. 12Engstfeld, A.K., Hermann, J.M., Hörmann, N. and Rüth, J. (2023) svgdigitizer. Available at: 10.5281/zenodo.8428961
  13. 13Engstfeld, A., Rüth, J., Hermann, J. and Hörmann, N. (2024) echemdb/unitpackage: 0.9.1. Available at: https://zenodo.org/records/15202332
  14. 14Executable Books Community. (2020) Jupyter book. Available at: 10.5281/zenodo.4539666
  15. 15Ghiringhelli, L.M., Baldauf, C., Bereau, T., Brockhauser, S., Carbogno, C., Chamanara, J., Cozzini, S., Curtarolo, S., Draxl, C., Dwaraknath, S., Fekete, Á., Kermode, J., Koch, C.T., Kühback, M., Ladines, A.N., Lambrix, P., Himmer, M.O., Levchenko, S.V., Oliveira, M., Michalchuk, A., Miller, R.E., Onat, B., Pavone, P., Pizzi, G., Regler, B., Rignanese, G.M., Schaarschmidt, J., Scheidgen, M., Schneidewind, A., Sheveleva, T., Su, C., Usvyat, Dl., Valsson, O., Wöll, C. and Scheffler, M. (2023) ‘Shared metadata for data-centric materials science’, Scientific Data, 10(1), pp. 626. Available at: 10.1038/s41597-023-02501-8
  16. 16Griffin, P.C., Khadake, J., LeMay, K.S., Lewis, S.E., Orchard, S., Pask, A., Pope, B., Roessner, U., Russell, K., Seemann, T., Treloar, A., Tyagi, S., Christiansen, J.H., Dayalan, S., Gladman, S., Hangartner, S.B., Hayden, H.L., Ho, W.W.H., Keeble-Gagnère, G., Korhonen, P.K., Neish, P., Prestes, P.R., Richardson, M.F., Watson-Haigh, N.S., Wyres, K.L., Young, N.D. and Schneider M.V. (2018) ‘Best practice data life cycle approaches for the life sciences’, F1000Research, 6, pp. 1618. Available at: 10.12688/f1000research.12344.2
  17. 17Group, I.J.S.W. (2020) JSON schema: A media type for describing JSON documents. Available at: https://json-schema.org/draft/2020-12/json-schema-core.html.
  18. 18Hermann, J.M. and Engstfeld, A.K. (2024) autotag-metadata. Available at: 10.5281/zenodo.12706775
  19. 19Higgins, S.G., Nogiwa-Valdez, A.A. and Stevens, M.M. (2022) ‘Considerations for implementing electronic laboratory notebooks in an academic research environment’, Nature Protocols, 17(2), pp. 179189. Available at: 10.1038/s41596-021-00645-8
  20. 20Hunter, J.D. (2007) ‘Matplotlib: A 2D graphics environment’, Computing in Science & Engineering, 9(3), pp. 9095. Available at: 10.1109/MCSE.2007.55
  21. 21Jacob, D., David, R., Aubin, S. and Gibon, Y. (2020) ‘Making experimental data tables in the life sciences more FAIR: A pragmatic approach’, GigaScience, 9(12), pp. giaa144. Available at: 10.1093/gigascience/giaa144
  22. 22Jejkal, T., Chelbi, S., Pfeil, A. and Wittenburg, P. (2022) ‘Evaluation of application possibilities for packaging technologies in canonical workflows’, Data Intelligence, 4(2), pp. 372385. Available at: 10.1162/dint_a_00137
  23. 23Kanza, S., Willoughby, C., Gibbins, N., Whitby, R., Frey, J.G., Erjavec, J., Zupančič, K., Hren, M. and Kovač, K. (2017) ‘Electronic lab notebooks: can they replace paper?’ Journal of Cheminformatics, 9, pp. 115. Available at: 10.1186/s13321-017-0221-3
  24. 24Karev, E., Camilleri, P., Baptista, V., Bere, G., Borruso, A., Desmet, P., Gharti, S., Herrmann, A., Kariv, A., Shaw, C., Walsh, P., Winfree, L., Zanella Alvarenga, E., Zedlitz, J., Open Knowledge Foundation. and Petti, S. (2024) frictionlessdata/frictionless-py: v5.17.0. Available at: 10.5281/zenodo.11085500
  25. 25Kunze, J.A., Littman, J., Madden, L., Scancella, J. and Adams, C. (2018) ‘The BagIt file packaging format (V1.0)’, RFC 8493. Available at: https://www.rfc-editor.org/info/rfc8493.
  26. 26Leipzig, J., Nüst, D., Hoyt, C.T., Ram, K. and Greenberg, J. (2021) ‘The role of metadata in reproducible computational research’, Patterns, 2(9), pp. 100322. Available at: 10.1016/j.patter.2021.100322
  27. 27Machina, H.K. and Wild, D.J. (2013) ‘Electronic laboratory notebooks progress and challenges in implementation’, Journal of Laboratory Automation, 18(4), pp. 264268. Available at: 10.1177/2211068213484471
  28. 28McKinney, W. (2010) ‘Data structures for statistical computing in Python’, Proceedings of the 9th Python in science conference, SciPy, 2010. Austin, Texas, 28 June–3 July. pp. 5661. Available at: 10.25080/Majora-92bf1922-012
  29. 29Molloy, L. and Snow, K. (2012) ‘The data management skills support initiative: Synthesising postgraduate training in research data management’, International Journal of Digital Curation, 7(2), pp. 101109. Available at: 10.2218/ijdc.v7i2.233
  30. 30Pingarrón, J.M., Labuda, J., Barek, J., Brett, C.M., Camoes, M.F., Fojta, M. and Hibbert, D.B. (2020) ‘Terminology of electrochemical methods of analysis (IUPAC Recommendations 2019)’, Pure and Applied Chemistry, 92(4), pp. 641694. Available at: 10.1515/pac-2018-0109
  31. 31Pinoli, P., Ceri, S., Martinenghi, D. and Nanni, L. (2019) ‘Metadata management for scientific databases’, Information Systems, 81, pp. 120. Available at: 10.1016/j.is.2018.10.002
  32. 32Plotly Technologies Inc. (2015) Collaborative data science. Available at: https://plot.ly.
  33. 33Robitaille, T.P., Tollerud, E.J., Greenfield, P., Droettboom, M., Bray, E., Aldcroft, T., Davis, M., Ginsburg, A., Price-Whelan, A.M., Kerzendorf, W.E., Conley, A., Crighton, N., Barbary, K., Muna, D., Ferguson, H., Grollier, F., Parikh, M.M., Nair, P.H., Günther, H.M., Deil, C., Woillez, J., Conseil, S., Kramer, R., Turner, J.E.H., Singer, L., Fox, R., Weaver, B.A., Zabalza, V., Edwards, Z.I., Bostroem, K.A., Burke, D.J., Casey, A.R., Crawford, S.M., Dencheva, N., Ely, J., Jenness, T., Labrie, K., Lim, P.L., Pierfederici F., Pontzen, A., Ptak, A., Refsdal, B., Servillat, M. and Streicher, O. (2013) ‘Astropy: A community Python package for astronomy’, Astronomy & Astrophysics, 558, pp. A33. Available at: 10.1051/0004-6361/201322068
  34. 34Rocca-Serra, P. and Sansone, S.-A. (2019) ‘Experiment design driven FAIRification of omics data matrices, an exemplar’, Scientific Data, 6(1), pp. 271. Available at: 10.1038/s41597-019-0286-0
  35. 35Soiland-Reyes, S., Sefton, P., Crosas, M., Castro, L.J., Coppens, F., Fernández, J.M., Garijo, D., Grüning, B., La Rosa, M., Leo, S., Ó Carragáin, E., Portier, M., Trisovic, A., RO-Crate Community, Groth, P. and Goble, C. (2022) ‘Packaging research artefacts with RO-Crate’, Data Science, 5(2), pp. 97138. Available at: 10.3233/DS-210053
  36. 36Stier, S. and Gold, L. (2024) Battery Value Chain Ontology (BVCO). Available at: 10.5281/zenodo.10641087
  37. 37The echemdb community developers. echemdb.org website. Available at: https://www.echemdb.org/cv/.
  38. 38The echemdb development team. echemdb Github community. Available at: https://github.com/echemdb.
  39. 39The echemdb development team. electrochemistry-data. Available at: https://github.com/echemdb/electrochemistry-data/tree/0.3.2.
  40. 40The echemdb development team. metadata-schema. Available at: https://github.com/echemdb/metadata-schema/tree/0.2.0.
  41. 41The echemdb development team. Raw to Figure website. Available at: https://github.com/echemdb/rawtofigure.
  42. 42The jsonld development team. JSON-LD. Available at: https://json-ld.org/.
  43. 43The MKDOCS development team. MKDOCS. Available at: https://github.com/mkdocs/mkdocs.
  44. 44The pandas development team. pandas-dev/pandas: Pandas. 2020. Available at: 10.5281/zenodo.3509134
  45. 45The RDM-kit elixir authors. Data Organization. Available at: https://rdmkit.elixir-europe.org/data_organisation.
  46. 46The watchdog development team. watchdog. Available at: https://pythonhosted.org/watchdog/.
  47. 47van den Burg, G.J., Nazábal, A. and Sutton, C. (2019) ‘Wrangling messy CSV files by detecting row and type patterns’, Data Mining and Knowledge Discovery, 33, pp. 17991820. Available at: 10.1007/s10618-019-00646-y
  48. 48Vandendorpe, J., Adam, B., Wilbrandt, J., Lindstädt, B. and Förstner, K.U. (2024) ‘Ten simple rules for implementing electronic lab notebooks (ELNs)’, PLOS Computational Biology, 20(6), pp. e1012170. Available at: 10.1371/journal.pcbi.1012170
  49. 49W3C Extensible Markup Language (XML) 1.0. (1998) Available at: https://www.w3.org/TR/xml/.
  50. 50Welter, D., Juty, N., Rocca-Serra, P., Xu, F., Henderson, D., Gu, W., Strubel, J., Giessmann, R., Emam, I., Gadiya, Y., Abbassi-Daloii, T., Alharbi, E., Gray, A.J.G., Courtot, M., Gribbon, P., Ioannidis, V., Reilly, D.S., Lynch, N., Boiten, J.W., Satagopam, V., Goble, C., Sansone, S.A. and Burdett, T. (2023) ‘FAIR in action-a flexible framework to guide FAIRification’. Scientific Data, 10, pp. 291. Available at: 10.1038/s41597-023-02167-2
  51. 51Whitmire, A.L. (2015) ‘Implementing a graduate-level research data management course: Approach, outcomes, and lessons learned’, Journal of Librarianship and Scholarly Communication, 3(2), pp. 123. Available at: 10.7710/2162-3309.1246
  52. 52Wilkinson, M.D., Dumontier, M., Aalbersberg, I.J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L.B., Bourne, P.E., Bouwman, J., Brookes, A.J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C.T., Finkers, R., Gonzalez-Beltran, A., Alasdair, Gray, J.G., Groth, P., Goble, C., Grethe, J.S., Heringa, J., ‘t Hoen, P.A.C., Hooft, R., Kuhn, T., Kok, R., Kok, J., Lusher, S.J., Martone, M.E., Mons, A., Packer, A.L., Persson, B., Rocca-Serra, P., Roos, M., van Schaik, R., Sanson, S.A., Schultes, E., Sengstag, T., Slater, T., Strawn, G., Swertz, M.A., Thompson, M., van der Lei, J., van Mulligen, E., Velterop, J., Waagmeester, A., Wittenburg, P., Wolstencroft, K., Zhao, J. and Mons, B. (2016) ‘The FAIR Guiding Principles for scientific data management and stewardship’, Scientific Data, 3, pp. 19. Available at: 10.1038/sdata.2016.18
  53. 53Willis, C., Greenberg, J. and White, H. (2012) ‘Analysis and synthesis of metadata goals for scientific data’, Journal of the American Society for Information Science and Technology, 63, pp. 15051520. Available at: https://scholarship.law.duke.edu/faculty_scholarship/2713. Available at: 10.1002/asi.22683
  54. 54Yano, J., Gaffney, K.J., Gregoire, J., Hung, L., Ourmazd, A., Schrier, J., Sethian, J.A. and Toma, F.M. (2022) ‘The case for data science in experimental chemistry: examples and recommendations’, Nature Reviews Chemistry, 6, pp. 357370. Available at: 10.1038/s41570-022-00382-w
Language: English
Submitted on: Nov 7, 2024
Accepted on: Mar 11, 2025
Published on: Apr 21, 2025
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2025 Albert K. Engstfeld, Johannes M. Hermann, Nicolas G. Hörmann, Julian Rüth, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.