Have a personal or library account? Click to login
Datasheets for Digital Cultural Heritage Datasets Cover

References

  1. 1Alkemade, H., Claeyssens, S., Colavizza, G., Freire, N., Irollo, A., Lehmann, J., Neudecker, C., Osti, G., & van Strien, D. (2023). Datasheets for Digital Cultural Heritage Datasets. Version 1. (last accessed: October 4th, 2023). DOI: 10.5281/ZENODO.8375033
  2. 2Apte, P. (2017, September 27). The Data Scientist Putting Ethics Into AI. (last accessed: October 4th, 2023). https://web.archive.org/web/20170930075045/http://www.ozy.com/rising-stars/rumman-chowdhury-the-human-centric-thinker/81044
  3. 3Barredo Arrieta, A., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., Garcia, S., Gil-Lopez, S., Molina, D., Benjamins, R., Chatila, R., & Herrera, F. (2020). Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion, 58, 82115. DOI: 10.1016/j.inffus.2019.12.012
  4. 4Beals, M., & Bell, E. (2020). The Atlas of Digitised Newspapers and Metadata: Reports from Oceanic Exchanges. (last accessed: October 4th, 2023). DOI: 10.6084/M9.FIGSHARE.11560059
  5. 5Beelen, K., Lawrence, J., Wilson, D. C. S., & Beavan, D. (2023). Bias and representativeness in digitized newspaper collections: Introducing the environmental scan. Digital Scholarship in the Humanities, 38(1), 122. DOI: 10.1093/llc/fqac037
  6. 6Brate, R., Nesterov, A., Vogelmann, V., van Ossenbruggen, J., Hollink, L., & van Erp, M. (2021). Capturing Contentiousness: Constructing the Contentious Terms in Context Corpus. Proceedings of the 11th on Knowledge Capture Conference, 1724. DOI: 10.1145/3460210.3493553
  7. 7British Library, Morris, V., van Strien, D., Tolfo, G., Afric, L., Robertson, S., Tiney, P., Dogterom, A., & Wollner, I. (2021). 19th Century Books—Metadata with additional crowdsourced annotations. (last accessed: October 4th, 2023). DOI: 10.23636/BKHQ-0312
  8. 8Candela, G., Gabriëls, N., Chambers, S., Pham, T.-A., Ames, S., Fitzgerald, N., Hofmann, K., Harbo, V., Potter, A., Ferriter, M., Manchester, E., Irollo, A., Van Keer, E., Mahey, M., Holownia, O., & Dobreva, M. (2023). A Checklist to Publish Collections as Data in GLAM Institutions. (last accessed: October 4th, 2023). DOI: 10.48550/ARXIV.2304.02603
  9. 9Conway, P. (2015). Digital transformations and the archival nature of surrogates. Archival Science, 15(1), 5169. DOI: 10.1007/s10502-014-9219-z
  10. 10Corrado, E. M., & Moulaison Sandy, H. L. (2017). Digital preservation for libraries, archives, and museums (Second Edition). Rowman & Littlefield.
  11. 11Devaraju, A., Huber, R., Mokrane, M., Herterich, P., Cepinskas, L., de Vries, J., L’Hours, H., Davidson, J., & White, A. (2020). FAIRsFAIR Data Object Assessment Metrics. (last accessed: October 4th, 2023). DOI: 10.5281/ZENODO.4081213
  12. 12Edmond, J., & Lehmann, J. (2021). Digital humanities, knowledge complexity, and the five ‘aporias’ of digital research. Digital Scholarship in the Humanities, 36(Supplement_2), ii95ii108. DOI: 10.1093/llc/fqab031
  13. 13Fiorucci, M., Khoroshiltseva, M., Pontil, M., Traviglia, A., Del Bue, A., & James, S. (2020). Machine Learning for Cultural Heritage: A Survey. Pattern Recognition Letters, 133, 102108. DOI: 10.1016/j.patrec.2020.02.017
  14. 14Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Daumé III, H., & Crawford, K. (2021). Datasheets for Datasets. Communications of the ACM, 64(12), 8692. DOI: 10.1145/3458723
  15. 15Holstein, K., Wortman Vaughan, J., Daumé, H., Dudik, M., & Wallach, H. (2019). Improving Fairness in Machine Learning Systems: What Do Industry Practitioners Need? Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, 116. DOI: 10.1145/3290605.3300830
  16. 16Hubbard, D. W. (2010). How to measure anything: Finding the value of ‘intangibles’ in business (Second Edition). Hoboken, NJ: Wiley. DOI: 10.1002/9781118983836
  17. 17Jo, E. S., & Gebru, T. (2020). Lessons from archives: Strategies for collecting sociocultural data in machine learning. Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, 306316. DOI: 10.1145/3351095.3372829
  18. 18Kapoor, S., & Narayanan, A. (2022). Leakage and the Reproducibility Crisis in ML-based Science. (arXiv:2207.07048). (last accessed: October 4th, 2023). DOI: 10.48550/ARXIV.2207.07048; 10.1016/j.patter.2023.100804
  19. 19Kirk, H. R., Birhane, A., Vidgen, B., & Derczynski, L. (2022). Handling and Presenting Harmful Text in NLP Research. (arXiv:2204.14256). (last accessed: October 4th, 2023). DOI: 10.48550/ARXIV.2204.14256; 10.18653/v1/2022.findings-emnlp.35
  20. 20Lee, B. C. G. (2023). The “Collections as ML Data” Checklist for Machine Learning & Cultural Heritage. Journal of the Association for Information Science and Technology, 122. DOI: 10.1002/asi.24765
  21. 21Library of Congress. (2019, December). Encoded Archival Description Tag Library Version EAD3 1.1.1. (last accessed: October 4th, 2023). https://www.loc.gov/ead/EAD3taglib/EAD3.html
  22. 22Luthra, M., Todorov, K., Jeurgens, C., & Colavizza, G. (2022a). Unsilencing Colonial Archives via Automated Entity Recognition (arXiv:2210.02194). (last accessed: October 4th, 2023). DOI: 10.48550/ARXIV.2210.02194
  23. 23Luthra, M., Todorov, K., Wissen, L. van, Jeurgens, C., & Colavizza, G. (2022b). Unsilencing Colonial Archives via Automated Entity Recognition. (last accessed: October 4th, 2023). DOI: 10.5281/zenodo.7129316; 10.1108/JD-02-2022-0038
  24. 24O’Neil, L. (2023, August 12). These Women Tried to Warn Us About AI. Rolling Stone. (last accessed: October 4th, 2023). Retrieved from https://www.rollingstone.com/culture/culture-features/women-warnings-ai-danger-risk-before-chatgpt-1234804367/
  25. 25Padilla, T. (2019). Responsible Operations: Data Science, Machine Learning, and AI in Libraries. Dublin, OH: OCLC Research. DOI: 10.25333/XK7Z-9G97
  26. 26Padilla, T., Allen, L., Frost, H., Potvin, S., Roke, E., & Varner, S. (2022). Always Already Computational: Collections as Data. (last accessed: October 4th, 2023). DOI: 10.17605/OSF.IO/MX6UK; 10.1108/JD-02-2022-0038
  27. 27Porter, T. M. (1996). Trust in Numbers. The Pursuit of Objectivity in Science and Public Life. Princeton: Princeton University Press. DOI: 10.1515/9780691210544
  28. 28Pushkarna, M., Zaldivar, A., & Kjartansson, O. (2022). Data Cards: Purposeful and Transparent Dataset Documentation for Responsible AI. 2022 ACM Conference on Fairness, Accountability, and Transparency, 17761826. DOI: 10.1145/3531146.3533231
  29. 29Rakova, B., Yang, J., Cramer, H., & Chowdhury, R. (2021). Where Responsible AI meets Reality: Practitioner Perspectives on Enablers for Shifting Organizational Practices. Proceedings of the ACM on Human-Computer Interaction, 5(CSCW1), 7: 17. 23. DOI: 10.1145/3449081
  30. 30Reshetnikov, A., Marinescu, M.-C., & Lopez, J. M. (2022). DEArt: Dataset of European Art (arXiv:2211.01226). (last accessed: October 4th, 2023). DOI: 10.48550/arXiv.2211.01226
  31. 31Scheuerman, M. K., Hanna, A., & Denton, E. (2021). Do Datasets Have Politics? Disciplinary Values in Computer Vision Dataset Development. Proceedings of the ACM on Human-Computer Interaction, 5(CSCW2), 137. DOI: 10.1145/3476058
  32. 32Singh, A. (2019). Beyond the Archive Gap: The Kiplings and the Famines of British Colonial India. South Asian Review, 40(3), 237251. DOI: 10.1080/02759527.2019.1599562
  33. 33UNESCO. (2003, March). UNESCO Charter on the Preservation of the Digital Heritage—UNESCO Digital Library. (last accessed: October 4th, 2023). Retrieved from https://unesdoc.unesco.org/ark:/48223/pf0000229034.locale=en. DOI: 10.1007/978-3-031-25056-9_15
  34. 34Urton, G. (1997). The Social Life of Numbers. A Quechua Ontology of Numbers and Philosophy of Arithmetic (First Edition). Austin: University of Texas Press.
  35. 35Van Erp, J. A. A., Langen, C. D., Boon, A., & Van Bochove, K. (2018). Testing the FAIR metrics on data catalogs. PeerJ Preprints, 6, e27151v2. DOI: 10.7287/peerj.preprints.27151v2
  36. 36Wevers, M. (2022). Fotopersbureau De Boer Training Set on Scene Detection (0.2). (last accessed: October 4th, 2023). DOI: 10.5281/zenodo.7118409
  37. 37Wilkinson, M. D., Dumontier, M., Aalbersberg, Ij. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., Da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., … Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3(1), 160018. DOI: 10.1038/sdata.2016.18
  38. 38Wilkinson, M. D., Sansone, S.-A., Schultes, E., Doorn, P., Bonino Da Silva Santos, L. O., & Dumontier, M. (2018). A design framework and exemplar metrics for FAIRness. Scientific Data, 5(1), 180118. DOI: 10.1038/sdata.2018.118
DOI: https://doi.org/10.5334/johd.124 | Journal eISSN: 2059-481X
Language: English
Submitted on: Jul 28, 2023
Accepted on: Sep 26, 2023
Published on: Oct 30, 2023
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2023 Henk Alkemade, Steven Claeyssens, Giovanni Colavizza, Nuno Freire, Jörg Lehmann, Clemens Neudecker, Giulia Osti, Daniel van Strien, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.