(1) Introduction
(1.1) From Metadata Dungeon to Linked Ecosystem
Every cultural heritage data curator knows this scenario: a digital archive fit for a medieval dungeon, haunted by unlinked identifiers, estranged entities, and duplicates that seem to regenerate as fast as they are slain. Working with Bildindex der Kunst und Architektur,1 one of Germany’s most important image collections hosted by Deutsches Dokumentationszentrum für Kunstgeschichte – Bildarchiv Foto Marburg (DDK), can sometimes feel like a long-running campaign in Dungeons & Dragons: behind every door you open (resolving a duplicate, fixing an identifier) possibly lies a new challenge to battle, and sometimes, yet often enough, unexpected hidden treasures of clarity, provenance, and enriched connections emerge.
This paper explores how cultural heritage institutions can use external identifier properties in Wikidata as enchanted keys in this labyrinth, transforming fragile metadata islands into sustainable, linked networks. External identifier properties link items on Wikidata to corresponding entities in other web portals. By drawing on the experiences of DDK as well as from National Historical Museums of Sweden (through the project “Usable Authorities for Data-driven Cultural Heritage Research”)2, this paper discusses methods, challenges, and limitations in integrating authority data with Wikidata, using tools like OpenRefine and leveraging community processes. The analysis is meant to be both playful and serious: the metaphor (metadata dungeons) shall help visualise challenges, while the methodological grounding ensures academic substance.
(1.2) Workflow and Dataset
The empirical foundation of this paper is a long-term initiative at DDK to reconcile their metadata from Bildindex with Wikidata. Rather than a bounded project, this work has evolved into an ongoing campaign to establish sustainable authority connections together with the requirement to carry out internal optimisations on the database. The turn toward Wikidata occurred after repeated attempts to integrate architectural datasets into the GND (Gemeinsame Normdatei)3 revealed systemic obstacles. While GND may be suited well for personal names and corporate bodies, it contains only a marginal number of architectural entities (Table 1).4
Table 1
Distribution of entities within GND retrieved via DNB SPARQL Service.
| PERSONS | CORPORATEBODIES | WORKS | BUILDINGS |
|---|---|---|---|
| 6,496,803 | 1,285,915 | 239,782 | 76,997 |
This situation prompted the exploration of Wikidata as a more flexible, collaborative, and semantically expressive environment.
A key step in this transition involved addressing legacy identifiers already circulating in Wikidata. An older, community-created Bildindex ID property (P2092) existed, but since it was based on the Bildindex URL it had some weaknesses: it was neither persistent nor could it point to a description on a narrower level of a hierarchically structured object record.5 Using SPARQL, all items carrying this outdated identifier were identified. In a systematic review, hierarchical ambiguities were resolved: items previously linked to composite structures were redirected to precise sub-entities (e.g., individual retable wings, specific buildings within ensembles), and a new external identifier property P127546 was added consistently across these items using OpenRefine. This process ensured both historical transparency – retaining knowledge of the former identifier – and semantic precision in current modelling.
Reconciliation then continued in thematic work packages, concentrating on different architectural data corpora. Following these steps, from roughly 22,000 original records in a dataset from the German state of Hesse, about 6,000 were marked suitable at first for reconciliation after filtering out entries with incomplete address or name data. Of these, around 400 entirely new items were created in Wikidata to fill gaps where no existing entity could be matched.
As of October 2025, 11,697 Wikidata items – for buildings and artworks – carry a Bildindex PID7 (P12754), linking them to corresponding records in the institutional database. Each identifier resolves through a persistent formatter URL, making every connection traceable to its original record within Bildindex.
(2) External Identifier Properties: The Compass Rose of Metadata
(2.1) Beyond Technicalities: Provenance as Infrastructure
External identifier properties are more than technical markers – they act as enchanted keys in the vast dungeon of cultural metadata, unlocking provenance trails and allowing both researchers and machines to follow an item’s journey across collections. They mark the difference between:
I found this church building somewhere in Germany and
This is “Our Lady Church”,8 indexed by Bildarchiv Foto Marburg under P12754 (Bildindex PID: 00013174109), and connected via P227 (GND ID 4762518-110)
In the second case, provenance becomes queryable, reusable, and trustworthy.
(2.2) Case for External Identifiers
Institutions like DDK or National Historical Museums of Sweden invested in the creation of properties for authority control11 because sustainable Linked Open Data is made possible with them. They:
Secure provenance and trust: each object’s origin and authority become traceable through persistent institutional references;
Foster interoperability: by linking institutional repositories such as Bildindex or the collections database of National Historical Museums Sweden with global authority systems (e.g. GND, VIAF, Getty vocabularies), external identifiers connect local metadata ecosystems to the semantic web;
Enable cross-collection research: those identifiers allow harmonized SPARQL queries across collections and institutions, turning distributed records into navigable, interlinked knowledge graphs;
Support iterative round-tripping: institutions can reimport Wikidata-enriched records together with community-added identifiers and contextual data into their local systems for sustainable curation and reuse.
The process of establishing external identifier properties in Wikidata is itself a community affair: property proposals are debated, constraints are set, and consensus determines acceptance (Neubert, 2017; Pellizzari di San Girolamo, 2024). This negotiation embeds institutional authority into a participatory ecosystem.
(2.2.1) Illustrative Example: Cross-Collection Interoperability in Practice
One practical demonstration of this interoperability can be seen in the way external identifiers accumulate across cultural heritage datasets once they converge in Wikidata. Figure 1 visualizes the 40 Wikidata items linked with a Bildindex PID that carry the largest number of additional external identifiers.12 It illustrates how external IDs turn distributed metadata into an integrated semantic landscape: an effect that cannot be achieved within isolated institutional databases.

Figure 1
Top 40 Wikidata items linked with Bildindex PID ranked by number of additional external identifiers.
(3) The Reconciliation Quest: From Chaos to Campaign
Reconciling institutional datasets with Wikidata and authority files such as GND rarely follows a straight path. It is less like running a machine and more like embarking on a long campaign full of side quests, hidden traps, and occasional treasures.
Each project idea begins with a party of curators and data stewards (if available), armed with spreadsheets and reference files, setting out to map their local worlds onto the global realm of Wikidata.
Matching a 17th-century church, manor house or castle across Bildindex, GND, and Wikidata seldom produces a clean one-to-one match. The same building may appear once as an architectural monument, again as a photographic subject, and yet again as part of a broader heritage ensemble. Variants in naming conventions, transliteration, conceptual vagueness, or incomplete geographic data mean that ambiguity is the rule rather than the exception.
(3.1) The Ontological Challenge: When Classes Blur
Before the first roll of the dice, every reconciliation campaign faces a deeper challenge: Wikidata’s ontology itself because it is large, multi-domain, and community-created. Its openness (one of its greatest strengths) also introduces inconsistencies that complicate reconciliation.
Ambiguous class or concept boundaries (Pintscher & Heintze, 2021), questionable subclass relationships, and the frequent confusion between instance of (P31) and subclass of (P279) make it difficult to determine which concept truly corresponds to a given heritage entity.
For example, a “Schloss” may be modelled as a fortified building, a residential palace, or simply as a building; “Kirche” may appear as both church building and heritage monument. None of these interpretations is wrong, but each represents a different modelling tradition. The confusion grows where the boundary between a building and a legal body dissolves. An entry for a “school,” complete with address, coordinates, and founding date, may refer to the institution, not the building; the same holds for “town halls”, “museums”, or “banks”. A current example of this is the Wikidata entry for Herzog August Library in Wolfenbüttel,13 where the instance of assignments themselves are already misleading, as they characterise the item as both a corporation and a building, i.e. the seat of the institution (Figure 2).

Figure 2
Example for discrepancies in the type determination of a Wikidata item.
These discrepancies reflect Wikidata’s community-driven evolution: contributors from archaeology, architecture, and art history bring distinct terminologies and ontological expectations, producing a patchwork of overlapping conceptual hierarchies. This heterogeneity has methodological consequences. When reconciling institutional datasets, curators enter the game board of competing models and conventions. They act as pathfinders, deciding which ontology takes the lead in each encounter: when to adapt to Wikidata’s logic, and when to defend the nuances of their institutional system.
The challenge is to find a stable mapping between a local classification system (like Bildindex’s internal typology) and Wikidata’s global, evolving schema. Too strict an alignment risks flattening historical nuance; too flexible an interpretation undermines comparability and machine readability.
Recognizing this balance is essential: reconciliation is not just a technical process of matching, but an ongoing negotiation between ontological precision and cultural context. This is where tools like OpenRefine become crucial, not as automatic solvers, but as spaces where curators can test, inspect, and document these negotiations with transparency.
(3.2) OpenRefine as Game Master
If reconciliation is seen as a game campaign, OpenRefine acts as its Game Master and the one who keeps the story coherent, balances automation and agency, and ensures that every move is recorded. It has become a standard workbench for reconciling heritage metadata14 because it combines powerful automation with human oversight together with strong community support and multiple free tutorials.15
In the Bildindex context, reconciliation extends beyond identifying the right entity: it also involves aligning the semantic environment of each record: not only must the main entity (e.g. a building or monument) be identified, but its type (instance of), location, and other contextual attributes must also be aligned. Curators reconcile not only what the object is, but where and how it exists within the semantic landscape – linking coordinates, related persons (such as the architect), dates of inception, and even its fate (still standing, rebuilt, or long vanished into ruin). Only when these contextual properties are properly linked can the record later be transformed into a complete, schema-compliant statement set for Wikidata or external systems like the GND.
This multi-level reconciliation is what turns OpenRefine from a simple matching tool into a genuine semantic modelling environment. In practice, each Bildindex reconciliation batch follows a repeatable workflow: preprocessing (normalisation, filtering), entity matching and validation, clustering-based disambiguation, semantic modelling through schema alignment, and finally export into Wikidata or external authority formats.
In detail, OpenRefine enables (among other things):
Data conditioning and cleaning: e.g. normalising labels, cleaning and enrichments (like adding coordinates via API based on address data), and filtering candidates through facets;
Contextual reconciliation: matching not only by string similarity but by additional features such as location, inception date, geographic coordinates and subject header;
Clustering and validation: identifying near-duplicates through algorithms such as fingerprinting and Levenshtein distance, or by defining custom clustering rules that combine multiple criteria to reflect domain-specific matching logic.16 The latest version of OpenRefine extend these capabilities with flexible distance-based and bin-defined clustering functions, allowing users to group values dynamically based on expressions rather than fixed algorithms;17
Schema alignment: structuring results for upload to Wikidata and Wikimedia Commons with clear provenance and license information as well as structured data;
Template export: transforming curated and reconciled data into structured output formats for external systems.
Despite its versatility, OpenRefine is not an end in itself. Its true strength lies in how it enables data to move between systems that were never designed to talk to one another.
Through various export functions and the possibility of a template export, reconciled records can be reshaped into formats compatible with external infrastructures such as the GND (MARC Authority) or museum collection systems (often LIDO XML). The capacity to translate formats ensures that reconciliation outcomes do not remain trapped in one platform but circulate across the broader knowledge ecosystem.
Of course, even the best game campaigns meet their limits. Reconciliation accuracy drops when labels deviate strongly from Wikidata entries, collaboration is limited by OpenRefine’s single-user access design, and very large datasets can push the software to its performance limits.
Yet these constraints are part of the adventure: they remind curators that reconciliation is not a push-button task but an iterative quest requiring both technical skill and curatorial insight.
(4) Constraints and the Bildindex PID
Once all our nicely enriched datasets enter the broader semantic ecosystem of Wikidata, they encounter another layer of structure and discipline: constraints. Constraints define what is allowed to connect, which values are acceptable, and how authority relationships must be maintained. The Bildindex PID (documented in the corresponding property talk18) illustrates both the power and limits of constraints.
It carries a format constraint, ensuring that only valid identifiers are used, and a type constraint, which expects that the linked entity is an architectural structure, artwork, or heritage site. In theory, this maintains consistency: every Bildindex PID should reference a stable, tangible object which is either an instance of (P31) art work (Q838948), artwork series (Q15709879), artificial physical object (Q8205328), or building (Q41176), or a subclass thereof.
In practice, however, the Bildindex corpus is very heterogeneous. It contains not only buildings, but also:
decorative elements (e.g. portals, reliefs, altars),
ensemble designations (e.g. Altstadt Marburg),
architectural fragments,
interiors or reconstructed views,
drawings,
and occasionally even documentation photographs of lost or hypothetical structures.
When these items are reconciled and linked with Wikidata, constraints frequently raise warnings or false positives. A record for a portal may violate the “instance of: architectural structure” expectation, while an ensemble record may not fit neatly into any approved item type. Even valid architectural entries can trigger warnings if they lack attributes expected for artworks – such as dimensions (P2048) or made from material (P186) – because Wikidata’s constraint system, designed across multiple domains, sometimes assumes shared modelling logic where none exists.
Instead of viewing this as a failure, it is more productive to treat constraint violations as feedback: small alarms that something about the modelling or even the source data system needs a second look or perhaps a rethink of Wikidata’s ontology itself.
(4.1) Why Constraints Matter
Despite these frictions, constraints are indispensable for the integrity of the Linked Open Data ecosystem. Even though the Bildindex PID has sparked discussions, constraint violations are not traps or fatal errors but guiding markers in the iterative curation process (as said). Several users have pointed out that many of Wikidata’s global constraints – such as those requiring subject (P180), genre (P136), or dimensions (P2048) – are optimized for artworks rather than for buildings or architectural ensembles. As a result, every use of property P12754 (Bildindex PID) on a building entry can trigger one or more violation notices, which some community members interpret as a structural incompatibility. This critique is valid within Wikidata’s logic but misses the institutional perspective. The Bildindex corpus represents a mixed typology of cultural heritage objects (artworks, photographs, monuments, and architectural records) whose shared identifier serves not aesthetic classification but provenance and interoperability. Creating separate properties for each material or object type (paintings, sculptures, buildings, etc.) would not only fragment the institutional linkage but also undermine Wikidata’s own principle of cross-domain integration and being a global hub for integrating and linking local (Neubert, 2017; Tharani, 2021). Instead of introducing new properties, a more sustainable approach is to refine constraint logic and documentation, clarifying that certain violations are domain-specific exceptions rather than true modelling errors.
Seen this way, the Bildindex PID does not break Wikidata’s model but tests its flexibility. It highlights the need for adaptive validation mechanisms capable of accommodating the diversity of heritage documentation, where “consistency” sometimes means acknowledging well-defined exceptions. Authority must remain flexible: too rigid an ontology risks excluding local documentation realities, yet too much openness undermines consistency and trust. The challenge lies in maintaining both stability and adaptability and finding a balance that defines sustainable metadata governance in the Wikidata-Bildindex context.
This balance is harder to achieve in more traditional authority systems. The German National Library, for instance, has long announced its intention to extend the circle of GND users and become more inclusive toward cultural heritage institutions,19 a promise often delayed by its own bureaucratic gravity. Its rulebook may keep the game orderly, but it leaves little room for side quests where small experiments or quick contributions from museums and archives rarely fit the system. In contrast, Wikidata’s open-ended ontology, which may be messier but more permeable, has proven far more effective in allowing heritage data to enter the Linked Open Data ecosystem at scale. It embraces iteration over perfection, encouraging collaboration rather than compliance, and thus manages to achieve in practice what the GND still struggles to operationalise: a living, evolving infrastructure for shared authority.
(5) Duplication and Ambiguity: Metadata’s Hydra
Duplication is metadata’s hydra: cut off one head, and two more appear. In the architectural domain, duplicates proliferate not because of negligence, but because the same building inhabits multiple descriptive realities. In Bildindex, architectural objects are documented by different institutions across multiple descriptive levels as works, as sites, and as photographic subjects, reflecting its dual function as an image archive and a documentation database. This layered documentation often leads to overlapping entries for the same physical structure, producing multiple identifiers that correspond to different curatorial perspectives rather than data errors.
(5.1) Where Duplicates Come From
In Bildindex, duplication arises from historical cataloguing practices and varying descriptive granularity.
Older inventories separated object and image metadata; later systems integrated them, sometimes duplicating entries during data migrations.
Architectural ensembles such as Landgrafenschloss Marburg20 generate recursive duplication: each substructure (tower, courtyard, gate) may appear both as an independent entity and as part of the ensemble.
Reconciling these against Wikidata creates another layer of complexity: Wikidata expects a single Q-ID per distinct entity type, but curatorial logic often favours hierarchical, interrelated records.
Duplication thus becomes a by-product of semantic mismatch between local documentation systems and Wikidata’s item-centric model. The problem is not redundancy itself, but the ambiguity of equivalence: when are two records “the same,” and when are they legitimately different perspectives on a shared referent?
(5.2) Strategies for Identifying and Managing Duplicates
Within Bildindex, duplication is not a binary concept. True duplicates, records that describe the same object with redundant metadata from the same data provider, are targeted for consolidation. However, the system also contains multiple valid identifiers for distinct representations or documentation levels of the same entity.
In other words, a single Wikidata item can legitimately reference several Bildindex PIDs, without those entries being considered duplicates inside Bildindex itself (see chapter 5.3.3 for more details).
Each identifier corresponds to a specific curatorial context (what was photographed, catalogued, or described) rather than to the abstract notion of “the building itself”. From Wikidata’s global, item-centric perspective, these may converge under one Q-ID; from Bildindex’s archival logic, they remain distinct but related manifestations of the same referent.
To manage this complexity, the Bildindex workflow combines reduction, linkage, and transparency:
Detection and review: during reconciliation, likely duplicates are flagged but are only merged when the metadata and provenance clearly overlap.
Cross-level mapping: reconciliation establishes explicit relationships rather than forcing merges, preserving each record’s provenance and descriptive focus.
Authority-based anchoring: each record that remains distinct is linked to stable identifiers (such as GND and Wikidata IDs), ensuring that multiple representations of the same object remain clearly connected (which simply means, multiple Bildindex PIDs may exist, but must be united within one Wikidata item thus linking them together).
This approach recognizes a semantic distinction between duplicates and legitimate parallel descriptions. While genuine redundancies are resolved, representational diversity is preserved as a feature of cultural documentation, not a flaw. In this hybrid state, a Wikidata entity may point to several Bildindex PIDs, each illuminating a different archival facet of the same architectural object and serve as a kind of landing page for the entity.
(5.3) Making Clustering Visible in Wikidata
Clustering21 and disambiguation do not end in OpenRefine; their true value emerges once the results become visible in Wikidata itself. Each modelling decision – whether a merge, a split, or a newly created item – translates curatorial reasoning into a traceable data structure. This visibility transforms reconciliation from a backstage operation into part of the scholarly record: users can inspect, query, and verify how ambiguity has been resolved. In the Bildindex workflow, this often involves distinguishing between different manifestations of the same cultural entity – a recurring challenge in heritage documentation. Three examples illustrate show such cases are represented in Wikidata through semantic relationships rather than manual merging.
(5.3.1) Disentangling paired Works
In the case of pendant paintings like Italian Landscape (Morning) and Italian Landscape (Evening), the legacy Bildindex metadata reflects an older cataloguing logic. In Wikidata, there are two items22 connected through the property pendant of (P1639). In Bildindex, however, there are three records: one combined entry describing the pair together (Figure 3), and two individual records for each painting (Figure 5).

Figure 3
Counterpart paintings in Bildindex, summarized in one combined entry describing the pair together.
Originally, both Wikidata items pointed to the same combined Bildindex record via the old external identifier “02558500” (P2092). This reflected the historical catalogue structure but introduced ambiguity: two separate works shared one provenance link (Figure 4).

Figure 4
Wikidata item “Italian Landscape (Evening)” with former and new Bildindex identifier.

Figure 5
Indvidual Bildindex representation of “Italian Landscape (Evening)”.
With the new Bildindex PID, the linkage has been refined, and each painting now connects to its own individual Bildindex record, while the pendant of (P1639) relation preserves their connection as a matched pair.
The same logic can be demonstrated through a SPARQL query that retrieves all pendant pairs in Wikidata which include a Bildindex PID. This example23 illustrates how connected works are modelled through explicit semantic relationships – each item linked to its correct institutional record yet connected through pendant of (P1639). When executed in the Wikidata Query Service, this query lists pairs such as just described, each item holding its own Bildindex PID (if available), while the pendant of link maintains the intellectual and art-historical connection between them. The result is not a flattened dataset, but a network of context: provenance and authority data remain distinct yet interoperable.
(5.3.2) Modelling Hierarchies: From Ensemble to Element
Not all clustering results in duplication or succession; sometimes, it reveals hierarchical depth. Architectural documentation often operates across multiple descriptive layers – a monument ensemble, its constituent buildings, and their individual elements. In Bildindex, this structure is mirrored in how objects are catalogued: an ensemble such as the Saint Peter’s Church, Fritzlar24 appears as a parent record, while subordinate entries describe its buildings, sculptures, and decorative programs.
In Wikidata, this layered structure is expressed through the properties has part(s) and part of. These properties translate the curatorial hierarchies of Bildindex into explicit, queryable hierarchical relationships,25 replacing the opaque clustering logic of local databases with transparent, machine-readable statements. Each returned result documents an explicit part-whole relation: a parent entity with a Bildindex PID that contains one or more sub-entities, some of which may also carry their own institutional identifiers.26 Researchers can now traverse from the whole to its parts, or inversely, reconstruct ensembles from scattered elements – a kind of digital archaeology that makes curatorial logic transparent and computationally accessible.
(5.3.3) When one Item Carries Several Bildindex PIDs
Not every case of “multiple Bildindex PIDs on one Wikidata item” signals a mistake. In Wikidata, these parallel records converge into a single Q-item when they describe the same entity; thus, multiple PIDs on that item simply record multiple institutional manifestations of the same object. In other words: one building, many scrolls in the archive.
Still, such cases are analytically valuable. Multiple PIDs27 can also reveal inconsistencies in the underlying catalogue – two Bildindex records that claim to describe the same entity, but differ subtly in name, scope, or location. In this sense, Wikidata becomes a mirror held up to the source system: every time an item holds more than one PID, it invites the question why.
(5.4) The Fog of Metadata: Where Certainty Ends and Interpretation Begins
Ambiguity in architectural metadata is not merely a nuisance but reflects the historicity of the documentation itself. The same site can be described as a Burg in one era and a Schloss or manor house in another; reconstructions, losses, renamings and even translocations blur categorical boundaries. In such cases, authority files like GND provide valuable anchors, but cannot fully resolve the ontological uncertainty embedded in the data. Wikidata’s flexibility allows curators to express this plurality through properties like has part (P527), part of (P361), or structure replaced by (P167) resp. structure replaces (P1398) yet these sometimes are interpretive, not definitive, decisions (e.g. at what stage does the renovation of a historical church cease to be restoration and become a re-creation and thus turn into a new entity?28). There is rarely a single “correct” level of granularity for entities such as “the tower of a church” or “the hall within a castle.” Ambiguity, then, becomes an interpretative question of cultural heritage metadata: it marks the tension between what can be standardized and what must remain context-dependent and interpreted by domain experts. Rather than trying to eliminate this ambiguity, reconciliation workflows make it visible and accountable: each mapping decision is documented, and each possible alternative is preserved through provenance links (via external identifier) to the corresponding Bildindex record and the Wikidata item. In this sense, ambiguity should not only be seen as an obstacle to data quality, but also as a sign of the diversity of interpretation within distributed documentation systems.
(5.5) The Value of Controlled Duplication
When managed appropriately, duplicates are not necessarily just errors, but rather a structural feature of cultural heritage metadata. This feature allows different interpretive frames (historical, architectural, photographic) to coexist within a shared data space. The challenge is not to eliminate every overlap, but to understand and document why it exists. Working with OpenRefine’s reconciliation service is valuable in that every uncertain match can be found and described. This process cannot be fully automated but will always remain intellectual work. However, it offers the opportunity to delve deep into your own data repository and find any underlying data quality dungeon (to stay with the metaphor) in the first place. The result may ultimately be a consolidated concordance list linking each Wikidata item to its corresponding Bildindex PID.
In parallel, Wikidata’s constraint reports help identify possible type mismatches – e.g. an item in Wikidata for a building with a Bildindex PID must also be of the type (instance of) building or a subclass thereof – or other violations. In this way, they serve as an external quality check, which can be conveniently adapted to the requirements or needs of the respective institution (see chapter 2, External Identifier Properties). Together, this dual documentation helps distinguish meaningful variation from genuine redundancy.
Controlled duplication thus becomes a curatorial method rather than a technical problem. Some overlaps reflect parallel descriptions of the same structure; others represent distinct perspectives on a shared referent. By documenting these relationships explicitly, Bildindex maintains both clarity and contextual richness.
OpenRefine’s reconciliation history, together with Wikidata’s constraint checks, helps curators identify when variation in records reflects real differences rather than errors. The goal is not to exterminate the hydra, but to train it and to let multiple heads coexist under clear identifiers, documented relationships, and transparent provenance.
Iterative reconciliation keeps collections alive. Each pass through the data unearths new connections and interpretations.
As in every good dungeon, the treasure lies behind the monsters: each resolved duplication enriches both systems, sharpening Wikidata’s coverage and clarifying institutional metadata for future reuse.
(6) The Round-tripping Principle: Wikidata as the Teleportation Circle and Gateway Between Worlds
Round-tripping, which means the process of exporting data to Wikidata, enriching it collaboratively, and reimporting it into institutional systems, has become a defining practice in linked open cultural heritage (Larsson et al., 2019). Projects like those of the Swedish National Heritage Board and Bildindex can be seen as showcases that Wikidata is not merely an external platform but part of a dynamic data ecosystem where authority control and community curation may reinforce one another.
Yet, Wikidata’s role extends beyond that of a circulation hub: it also serves as an entry point and a gateway into institutional knowledge. For many users and researchers, the first encounter with a monument, painting, or building happens not through a museum portal, but through Wikidata (e.g. surfacing in Wikipedia, Google’s Knowledge Graph, or SPARQL-driven visualisations). From there, property links such as Bildindex PID lead directly to the institutional record, transforming Wikidata into the front door of discovery. In this sense, Wikidata not only disseminates but also directs attention. Its open, multilingual environment acts as a connective interface between public curiosity and scholarly infrastructure.
This round-tripping process brings clear benefits:
Sustainability: distributed work across institutions and communities.
Innovation: community contributions often reveal overlooked connections.
Visibility: enriched records feed into public-facing platforms.
However, there are a few recurring challenges:
Quality assurance: community edits may introduce inconsistencies.
Technical hurdles: extracting enriched subsets demands expertise.
Institutional trust: reluctance to ingest “user-generated” changes.
SPARQL queries themselves have become analytical instruments. They allow curators to see what identifiers make possible: tracing the reach of external identifier properties, detecting gaps in coverage, and revealing cross-collection patterns invisible in siloed databases. In this sense, visualisation is not limited to maps or dashboards, it unfolds directly within the logic of the query. Without identifier properties like Bildindex PID, such structured exploration and comparative analysis across datasets would simply not exist.
Each query thus becomes both a diagnostic and a discovery tool, turning Wikidata into a living laboratory for understanding cultural data at scale.
All enrichment steps comply with Wikidata’s CC0 licensing model, ensuring legal reusability and transparent provenance tracking across institutional systems.
Projects like that from Swedish National Heritage Board and Nordic Museum demonstrate that round-tripping, when validated through expert review, can sustain long-term data stewardship (Fagerving, 2023, p. 253).
(7) Community Engagement: Where Metadata becomes Social
Wikidata is not just a database but a living community, shaped by debate, consensus, and negotiation. Every property, every class, every modelling choice is the result of discussion among editors, data curators, and domain specialists. To participate in this ecosystem definitely means working and being willing to enter a possible ongoing dialogue regarding e.g. questions of scope, hierarchy, and ontology.
Projects succeed when institutions step out of their fortresses and engage proactively with this open arena. Productive collaboration often begins with small, concrete acts:
Participating in workshops and editathons: contributing examples, clarifying domain needs, and learning community conventions;
Training staff in Wikidata editing and modelling norms: understanding how properties are proposed, discussed, and approved;
Building bridges with volunteers and Wikimedia affiliates: establishing mutual trust and communication channels that outlast individual projects.
The Swedish project “Usable Authorities for Data-driven Cultural Heritage Research” exemplifies this model: regular seminars, handbooks, and peer learning sustain digital literacy (Fagerving, 2023, p. 226). Similarly, the Bildindex team participates in regular Wikidata community meetings29 and contributes to various discussions, on property usage and modelling. Unlike many contributors, who work voluntarily after hours, members of the team behind Bildindex are able to participate as part of their professional responsibilities: a rare but important precedent. This is still not the norm, and much of the work in the Wikiverse is based on voluntary participation after hours.
(8) Institutional Transformation: From Fortress to Ecosystem
Integrating external identifier properties and engaging with Wikidata are not merely technical upgrades, they signal a deeper cultural transformation. Institutions long accustomed to controlling their data behind stone walls now step into open arenas where collaboration, transparency, and iteration are the norm (a transition that can, of course, be frightening). DDK’ sustained participation illustrates this transition in practice. Bridging the gap between volunteer passion and institutional commitment is one of the decisive challenges for the future of open cultural heritage.
When institutions actively engage (e.g. by dedicating staff time, offering recognition for community work, and integrating collaborative editing into official workflows), they transform not only their data practices but their organizational culture. They begin to see knowledge stewardship as a shared quest, not a solitary duty. This institutional recognition validates the work of volunteers and ensures long-term continuity beyond project funding cycles. As experience shows in the Bildindex case and the Swedish “Usable Authorities” project, formal participation also strengthens internal expertise: staff trained in Wikidata conventions bring back new modelling insights, data literacy, and a mindset of openness that enriches the entire institution.
Ultimately, institutional transformation is not just about publishing Linked Open Data, it is about learning to operate in the open. It requires trust in shared authority, willingness to negotiate meaning in public, and mechanisms to sustain participation beyond individual enthusiasm. Or, to borrow once more from the Dungeons & Dragons metaphor: institutions that once guarded their treasures now join the adventuring party. They bring resources, stability, and expertise (but the journey only succeeds when they travel together with the volunteers who have long kept the map alive).
(9) Conclusions: From Metadata Dungeon to Open Quest
The integration of external identifier properties into Wikidata reveals that metadata curation in the humanities is not merely a technical exercise bun an ongoing adventure; one that blends institutional expertise, community collaboration, and ontological negotiation. By embedding provenance into Wikidata through properties such as Bildindex PID, institutions create stable gateways between local databases and the global semantic web. What begins as a reconciliation task becomes a sustained quest for interoperability, transparency, and reuse.
The work with the Bildindex corpus illustrates this shift. At a broader level, this experiment confirms that sustainable Linked Open Data is a social as much as a technical construct. Wikidata functions as both an archive and a community, a place where curators, data scientists, and volunteers negotiate meaning and structure. Institutional participation such as that of DDK resembles a scouting party venturing beyond the walls of traditional cataloguing, mapping mostly unknown territories of community collaboration. Errors are part of the game, but each one helps chart the route for those who follow. Nevertheless, the path from isolated databases to open ecosystems depends on trust, literacy, and shared stewardship.
To sustain this transformation, institutions should:
apply for external identifier properties to anchor provenance and authority,
define constraints carefully but avoid excessive rigidity,
embrace iterative reconciliation workflows using tools like OpenRefine,
visualize results30 to make metadata work visible beyond technical audiences,
practice validated round-tripping to ensure data enrichment circulates back into institutional systems, and
invest in staff training to build confidence in collaborative, open data curation.
Most importantly, metadata should not be treated as a static catalogue but as a living system – one that evolves through interaction, correction, and reuse. Embarking on this shift requires a degree of institutional courage: much like opening a new door in a dungeon, one cannot know in advance whether the next step reveals complexity, clarity, or both. The move from fortress to ecosystem, from dungeon to open quest, is both conceptual and practical: it redefines cultural heritage data as a shared public infrastructure. Each reconciled item, each property, and each SPARQL query becomes part of a larger adventure: an open, collective campaign to make cultural knowledge more connected, comprehensible, and alive.
Notes
[1] https://www.bildindex.de (last accessed: 2025-11-11), referred to as “Bildindex”.
[2] https://shm.se/en/blog-article/usable-authorities-for-data-driven-cultural-heritage-research/ (last accessed: 2025-11-11).
[3] https://www.dnb.de/EN/Professionell/Standardisierung/GND/gnd_node.html (last accessed: 2025-11-11).
[4] https://sparql.dnb.de/gnd/q8SDSe (last accessed: 2025-11-11).
[5] See discussion on https://www.wikidata.org/wiki/Property_talk:P2092 (last accessed: 2025-11-11).
[6] Bildindex der Kunst und Architektur PID (in short: Bildindex PID): http://www.wikidata.org/entity/P12754 (last accessed: 2025-11-11).
[8] http://www.wikidata.org/entity/Q1823922 (last accessed: 2025-11-11).
[9] http://id.bildindex.de/thing/0001317410 (last accessed: 2025-11-11).
[10] https://d-nb.info/gnd/4762518-1 (last accessed: 2025-11-11).
[11] National Historical Museums of Sweden ID: http://www.wikidata.org/entity/P9495 (last accessed: 2025-11-11).
[12] SPARQL query: https://w.wiki/G2zp (last accessed: 2025-11-11).
[13] https://www.wikidata.org/entity/Q663820 (last accessed: 2025-11-11).
[14] https://openrefine.org/usage (last accessed: 2025-11-11).
[15] https://openrefine.org/external_resources and https://forum.openrefine.org (last accessed: 2025-11-11).
[16] https://openrefine.org/docs/manual/cellediting#custom-clustering-methods or https://openrefine.org/docs/technical-reference/clustering-in-depth (last accessed: 2025-11-11).
[17] https://github.com/OpenRefine/OpenRefine/issues/4301 (last accessed: 2025-11-11).
[18] https://www.wikidata.org/wiki/Property_talk:P12754 (last accessed: 2025-11-11).
[19] https://www.dnb.de/EN/Professionell/Standardisierung/GND/gnd_node.html#doc147904bodyText3 (last accessed: 2025-11-11).
[20] Landgrafenschloss Marburg: http://id.bildindex.de/thing/0001350452 (last accessed: 2025-11-11).
[21] Clustering in OpenRefine occurs before upload to Wikidata and therefore has no measurable impact on SPARQL performance; once an item is created or merged, the query engine operates on single Q-IDs rather than cluster groups. The main effect of clustering is therefore semantic cleanliness, not computational load. Performance considerations arise only indirectly, as cleaner entity structures reduce the need for disambiguation in complex queries.
[22] http://www.wikidata.org/entity/Q106674693 and http://www.wikidata.org/entity/Q106674704 (last accessed: 2025-11-11).
[23] SPARQL query: https://w.wiki/G2uG (last accessed: 2025-11-11).
[24] https://www.wikidata.org/entity/Q1265636 (last accessed: 2025-11-11).
[25] SPARQL query for items with Bildindex PID and “has part(s)”: https://w.wiki/FwoH (last accessed: 2025-11-11).
[26] Hierarchical queries also reveal missing links: if a parent item has a Bildindex PID but its part(s) do not, this often signals Bildindex records that can still be connected.
[27] SPARQL query for items with multiple Bildindex PID’s: https://w.wiki/G2uR (last accessed: 2025-11-11).
[28] Here an example of a succeeding church building, which serves as the successor to another structure: http://www.wikidata.org/entity/Q108902686 (last accessed: 2025-11-11).
[29] https://meta.wikimedia.org/wiki/WikiKult_-_Offene_Kulturdaten (last accessed: 2025-11-11).
[30] The property talk of Bildindex PID contains a list of examples of interesting SPARQL queries, including visualisations such as displaying all entries with a corresponding identifier property P12754 on a world map: https://www.wikidata.org/wiki/Property_talk:P12754 (last accessed: 2025-11-11).
Acknowledgements
We explicitly would like to thank Alexander Winkler (User:Awinkler3) and Maximilian Kristen (User:Kristbaum), trusted companions on the long quest to create Bildindex der Kunst und Architektur PID (P12754) on Wikidata. Their insight, guidance, and public support ensured the proposal’s safe passage through the community’s gates of consensus and helped anchor it firmly in the Wikidata knowledge realm.
Gratitude is also due to the ever-patient Wikidata community – those invisible clerics and archivists who keep the ontology alive – and to the team of information professionals and art historians at DDK, whose collaborative spirit made this campaign not only possible but rewarding.
Competing Interests
The authors have no competing interests to declare.
Author Contributions
Hanna-Lena Meiners: Writing – original draft
Klaus Bulle: Writing – review & editing
