Have a personal or library account? Click to login
Are We There Yet? Notes Towards Benchmarking an Experimental AI-Assisted Workflow for Humanities Data Cleaning and Reconciliation Cover

Are We There Yet? Notes Towards Benchmarking an Experimental AI-Assisted Workflow for Humanities Data Cleaning and Reconciliation

Open Access
|Mar 2026

Abstract

This paper introduces a novel AI-assisted pipeline developed to prepare data for the European Research Council-funded project “STEMMA: Systems of Transmitting Early Modern Manuscript Verse, 1475–1700.” Now approaching its midpoint, STEMMA develops and applies a data-driven approach to provide the first comprehensive study of the circulation of early modern English poetry in manuscript. The project began by aggregating and reconciling five of the largest and most authoritative existing datasets about early modern verse circulation. The sheer volume of data, along with the need to preserve early modern English spelling and scribal idiosyncrasies for later analyses, meant that off-the-shelf data cleaning tools like OpenRefine were not fit for purpose. To that end, our software developer created a staged pipeline to aid the removal of duplicates, creation of authorities, reconciliation, and assignment of unique identifiers.

The rapid and pragmatic way that this process was developed and deployed means that we did not take the time to benchmark it, nor is it feasible to do so retrospectively. However, this discussion paper records observations from this process and reflects on challenges and bottlenecks as well as opportunities. It points the way toward future benchmarks that are increasingly needed for novel applications of computational methods in the digital humanities. It also briefly considers the relationship between technical benchmarking and research project management.

DOI: https://doi.org/10.5334/johd.490 | Journal eISSN: 2059-481X
Language: English
Submitted on: Nov 27, 2025
|
Accepted on: Feb 20, 2026
|
Published on: Mar 18, 2026
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2026 Erin A. McCarthy, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.