Abstract
This article presents an ontology-centric workflow for adding, curating, analysing, and reusing humanities datasets with Wikidata. Anchored in a digital-humanities project that confronts data silos and gendered inequities—especially related to the Francoist repression and censorship—it combines a neuro-symbolic AI stack with a co-creative, community-driven methodology. We first operationalize “the ontology of Wikidata,” clarifying how classes, properties, and constraints behave in practice; how reified statements, qualifiers, and references guide modelling choices; and how Entity Schemas encode formal, testable expectations. We then detail an ontology-first pipeline aligning local humanities databases to Wikidata via reconciliation (OpenRefine), controlled property usage (constraints and ShEx), and scaled editing (QuickStatements, Mix’n’match). The paper contributes reusable modelling patterns for persons, works, events, censorship decisions, and provenance, adaptable across humanities domains. Using SPARQL in the Wikidata Query Service (WDQS), we demonstrate discovery and analysis alongside metrics for coverage, data quality, and equity—focusing on the visibility of women and other gender identities—while situating results within prior scholarship on gender gaps and the ways Wikidata mitigates and mirrors societal bias. Finally, we show how an ontology-backed knowledge graph (KG) underpins retrieval-augmented generation (RAG) to reduce large-language-model hallucination and bias, detailing how the HerStory NeSyAI project embeds Wikidata structures within an explainable neuro-symbolic architecture. Our contribution is case study and template: a step-by-step, reproducible method for humanities teams to use Wikidata’s ontology not merely to retrieve data but to publish, govern, and reuse datasets with verifiable, community-aligned semantics.
