Abstract
This discussion paper introduces a modular data model and Python-based workflow for transforming structured data from Wikidata into a formal ontology grounded in the CIDOC Conceptual Reference Model (CRM), the object-oriented Library Reference Model (LRMoo) by the International Federation of Library Associations and Institutions, and INTRO (the Intertextual, Interpictorial, and Intermedial Relations Ontology). Designed especially for digital comparative literary studies, the workflow supports the semantic modeling of information about authors, works, and – most importantly – intertextual phenomena automatically derived from Wikidata. It does so by employing established standards in the humanities while keeping interpretive attributions explicit. In this way, textual features and intertextual relations are framed as scholarly interpretations rather than objective facts. During post-processing, a mapping and alignment strategy integrates identifiers and connections to other ontologies relevant to literary studies and the modeling of textual relations. The resulting graph thus transforms Wikidata’s heterogeneous data into a more stable and interoperable form for humanities research – and especially comparative literary studies – by grounding it in standards used in GLAM institutions (galleries, libraries, archives, and museums) and across the Digital Humanities, and further aligning it with complementary ontologies. This paper presents the data model based on the Web Ontology Language (OWL) and accompanied by validation shapes in the Shapes Constraint Language (SHACL), introduces the workflow released as the Python package wiki2crm, and discusses challenges and opportunities, with a focus on working with Wikidata and potential future developments.
