Have a personal or library account? Click to login
Digital Healing: Metadata and Documentation for Health Web Archives Cover

Digital Healing: Metadata and Documentation for Health Web Archives

Open Access
|Feb 2025

Abstract

The Archive of Tomorrow project (2022–2023), funded by Wellcome, focused on archiving health-related discourse on the internet. This collaborative effort across multiple institutions contributed to the UK Web Archive and explored ways to make the collection more accessible to both digital researchers and the wider public. The project also focused on the concept of treating web archives collections as data. This paper examines the documentation required to enable such collections to be utilised as data, particularly through the creation of the Datasheet for Datasets and the Data Foundry project, aiming to help make web archives machine-readable.

The paper discusses the type of data and support archivists provided to researchers while navigating legal restrictions. It also highlights challenges in processing the data to ensure it is accessible to a non-technical audience, addressing the difficulties in scaling and handling sensitive health-related content.

Finally, the paper outlines work on the processing pipeline required to make the material accessible to a broader audience, emphasising that providing datasets and documentation alone is insufficient. It also raises concerns about the paradox of turning initially unstructured web content into structured datasets for both archival and user interaction purposes. This project contributes to understanding how web archives can be transformed for greater accessibility and research usability.

DOI: https://doi.org/10.5334/johd.272 | Journal eISSN: 2059-481X
Language: English
Submitted on: Nov 4, 2024
Accepted on: Jan 9, 2025
Published on: Feb 20, 2025
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2025 Leontien K. Talboom, Andrea Kocsis, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.