Are We There Yet? Notes Towards Benchmarking an Experimental AI-Assisted Workflow for Humanities Data Cleaning and Reconciliation

Figures & Tables

Data sources for the STEMMA project.

SOURCE	CONTACT	FORMAT	DATA TYPE	DATA VOLUME
Catalogue of English Literary Manuscripts	John Lavagnino, Kings College London	XML	Bibliographic dataset	103.9 MB (979 XML files)
DigitalDonne	Brent Nelson, University of Saskatchewan	CSV	Bibliographic dataset	1.2 MB (1 table, 4,240 lines)
Index of Selected English Poetry Manuscripts, 1590–1660	Joshua Eckhardt, Virginia Commonwealth University	CSV	Bibliographic dataset	1.8 MB (1 table, 9,178 lines)
Perdita Project: A Database for Early Modern Women’s Manuscript Compilations	Victoria Burke, University of Ottawa	HTML	Bibliographic dataset	25 MB (500 entries)
RECIRC: The Reception and Circulation of Early Modern Women’s Writing 1550–1700	Marie-Louise Coolahan, University of Galway	CSV	Bibliographic dataset	100 MB (170 tables)
Union First-Line Index of English Verse	Eric Johnson, Folger Shakespeare Library	CSV	Bibliographic dataset	247.3 MB (1 table, 704,321 rows)