Have a personal or library account? Click to login
Using Wikidata’s Ontology in Practice: A Neuro-Symbolic, Community-Centred Workflow for Integrating and Reusing Humanities Datasets Cover

Using Wikidata’s Ontology in Practice: A Neuro-Symbolic, Community-Centred Workflow for Integrating and Reusing Humanities Datasets

Open Access
|Dec 2025

Abstract

This article presents an ontology-centric workflow for adding, curating, analysing, and reusing humanities datasets with Wikidata. Anchored in a digital-humanities project that confronts data silos and gendered inequities—especially related to the Francoist repression and censorship—it combines a neuro-symbolic AI stack with a co-creative, community-driven methodology. We first operationalize “the ontology of Wikidata,” clarifying how classes, properties, and constraints behave in practice; how reified statements, qualifiers, and references guide modelling choices; and how Entity Schemas encode formal, testable expectations. We then detail an ontology-first pipeline aligning local humanities databases to Wikidata via reconciliation (OpenRefine), controlled property usage (constraints and ShEx), and scaled editing (QuickStatements, Mix’n’match). The paper contributes reusable modelling patterns for persons, works, events, censorship decisions, and provenance, adaptable across humanities domains. Using SPARQL in the Wikidata Query Service (WDQS), we demonstrate discovery and analysis alongside metrics for coverage, data quality, and equity—focusing on the visibility of women and other gender identities—while situating results within prior scholarship on gender gaps and the ways Wikidata mitigates and mirrors societal bias. Finally, we show how an ontology-backed knowledge graph (KG) underpins retrieval-augmented generation (RAG) to reduce large-language-model hallucination and bias, detailing how the HerStory NeSyAI project embeds Wikidata structures within an explainable neuro-symbolic architecture. Our contribution is case study and template: a step-by-step, reproducible method for humanities teams to use Wikidata’s ontology not merely to retrieve data but to publish, govern, and reuse datasets with verifiable, community-aligned semantics.

DOI: https://doi.org/10.5334/johd.439 | Journal eISSN: 2059-481X
Language: English
Submitted on: Oct 26, 2025
|
Accepted on: Dec 2, 2025
|
Published on: Dec 30, 2025
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2025 Miquel Centelles Velilla, Núria Ferran-Ferrer, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.