Have a personal or library account? Click to login
Bridging the Gaps: Integrating Bibliographic Metadata Into Wikidata for Literary Corpora Cover

Bridging the Gaps: Integrating Bibliographic Metadata Into Wikidata for Literary Corpora

Open Access
|Feb 2026

Abstract

This paper presents a case study on enhancing literary-corpus metadata by integrating large-scale bibliographic resources with Wikidata. Digital libraries such as Project Gutenberg or HathiTrust often provide only minimal metadata (e.g., author name and title). For large-scale literary analysis, however, it is crucial to include additional information such as year of publication, author gender, genre, or publisher. Conversely, using Wikidata to enrich existing literary-corpus metadata is challenging, as significant gaps in coverage remain. In this case study, we draw on the metadata of a large literary corpus to address these gaps. We conduct a feasibility analysis to determine how a workflow can be established that integrates metadata from bibliographic catalogues into Wikidata as a step in the digital-humanities pipeline. We explore both procedural approaches and existing software tools and discuss resulting challenges and limitations. Our methods are documented and open-source; the full Python scripts and data processing workflows are publicly available on GitHub.1 The goal is to develop reproducible methods for sharing and improving metadata availability across open platforms.

DOI: https://doi.org/10.5334/johd.483 | Journal eISSN: 2059-481X
Language: English
Submitted on: Nov 17, 2025
|
Accepted on: Jan 9, 2026
|
Published on: Feb 27, 2026
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2026 Katrin Rohrbacher, David Schrittesser, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.