(1) Context and motivation
The long-term preservation of digital research data requires reliable storage infrastructures and checking procedures to preserve files in their full integrity and authenticity as well as measures to ensure the files’ accessibility (Trognitz et al., 2024). One of these measures includes adding and storing machine-readable metadata describing the preserved data. The quality, interoperability, and sustainability of metadata is essential to ensure a file’s findability, accessibility, and reusability.
Within the humanities, data are often heterogeneous, multilingual, and created across diverse institutional contexts, which do not necessarily adhere to well-established metadata standards, such as DataCite (DataCite, 2024), Dublin Core (DublinCore, 2020), EDM (Isaac & Clayphan, 2013), or LIDO (CIDOC LIDO Working Group, 2025). This makes standardising metadata particularly challenging (Trognitz & Ďurčo, 2018, pp. 222–223). Furthermore, data curators must find a way to bridge gaps between local data practices and global frameworks of Linked Open Data (LOD) to ensure that datasets remain FAIR – findable, accessible, interoperable, and reusable (Wilkinson et al., 2016). At the Austrian Centre for Digital Humanities (ACDH), the trusted digital repository ARCHE,1 established in 2017, aims to maintain rich, semantically interoperable and machine-actionable metadata that enable discovery and reuse of metadata and the data they describe. To this end, all resources retained in ARCHE – from top level collection descriptions down to individual files and entities like persons and places related to them – are described with metadata according to a bespoke metadata schema, the ARCHE metadata schema (Trognitz & Ďurčo, 2018; Żółtak et al., 2025).
Many datasets arrive at ARCHE with incomplete or inconsistent metadata, requiring curators to verify, enrich, and normalise the provided information. ARCHE curators must also match, i.e. reconcile, the provided named entity records, with external authority files (Wikipedia contributors, 2025) like GeoNames2 for places, the Gemeinsame Normdatei3 (GND, en. Integrated Authority File) and the Virtual International Authority File4 (VIAF) for persons and institutions, and PeriodO5 for temporal periods. All of the authority files supported by ARCHE, are already integrated in Wikidata with properties of the type external identifier.6
Wikidata7 (Vrandečić & Krötzsch, 2014), established as an open knowledge base that provides LOD in 2012, is the central hub for structured data that is reused in other Wikimedia projects, such as Wikipedia or Wikimedia Commons. Wikidata acts as a secondary database and its records can be edited by anybody. The data, which covers a huge range of topics, is licensed under CC0 and is accessible to both humans and machines via an API or as a dump, regardless of rate limits attached to monetisation models, as for example introduced for GeoNames (GeoNames, n.d.; Wick, 2010). All these qualities have led Wikidata to become a linking hub connecting various authority files (Haller et al., 2022; Neubert, 2017; Zhao, 2023), including the ones mentioned above. For ARCHE curators tasked with metadata curation and enrichment, Wikidata has become a reliable anchor for finding and identifying missing identifiers for all types of entities.
The ARCHE metadata schema is detailed in Section 2 and two case studies elaborate on named entities, authority files, external identifiers and their reconciliation. To maximise interoperability and support multilinguality, some properties of the ARCHE metadata schema are linked to controlled vocabularies to increase the input of normalised values and allow the support of multiple languages. More details on this and the collation of controlled vocabularies with Wikidata as a valuable source are described in Section 3. Section 4 moves beyond ARCHE with three case studies that present how Wikidata is used at the aggregators Kulturpool, CLARIN, and ARIADNE for further metadata enhancement and normalisation.
(2) Metadata curation for ARCHE
Metadata, data about data (Caplan, 2003), should answer basic questions about a dataset and its context, to allow for better discoverability, understandability, and reusability (Huvila, 2022). Good metadata, which can be applied to different levels from the collection as a whole down to individual files or data units, provides information about how the data was produced, who was involved in its creation, and what the data is about. Ideally, metadata is provided as accurately and completely as possible in a standard format for best interoperability.
For ARCHE, a bespoke metadata schema was formalised in the Web Ontology Language OWL (Hitzler et al., 2012) based on actual collections that were ingested into ARCHE and with respect to already established metadata standards (Trognitz & Ďurčo, 2018). Over the years, the schema, which is hosted on GitHub,8 has evolved further to properly describe and also disseminate the heterogeneous humanities data the archive is entrusted with and now stands at version 6.0.0 (Żółtak et al., 2025) with six main classes with 11 sub-classes (Figure 1), about further 30 helper classes, 13 annotation properties and 140 properties. The properties can be distinguished further into 43 object properties and 97 data properties, where the latter mainly accommodate free text or formatted numbers for titles and various descriptive attributes, and the former are used to include either terms from controlled vocabularies (e.g. licenses, languages, resource type) or relate to instances of classes representing named entities, such as persons and organisations responsible for the creation, funding, etc. of a resource or places related to a resource. The named entities themselves include links to external authority files with the property acdh:hasIdentifier. External links are a requirement of the Five Star LOD Principles (Berners-Lee, 2006; Żółtak et al., 2022), which are adhered to by ARCHE in order to maximise the interoperability between ARCHE and other service providers. Open data is realised by licensing the metadata under CC0.

Figure 1
The ARCHE metadata schema and its main classes, including their sub-classes (rounded boxes). Instances of a class are detailed with object properties and data type properties (connectors with labels starting with ‘acdh:’). Object properties connect an instance of class with another instance of a class, e.g. a place or a person is related to a RepoObject (e.g. a Collection) via the property acdh:hasSpatialCoverage or acdh:hasContributor. A data type property is used to include a string value, e.g. the subject via acdh:hasSubject (connector ends in a square box). Object properties can also be used to connect to a class from another schema, e.g. to a skos:Concept, via acdh:hasLanguage or acdh:hasCategory.
(2.1) Case study: Named entities, authority files & identifiers
Metadata curation for ARCHE is an iterative process that ensures deposited datasets adhere to the ARCHE schema and provide as much information as possible at a minimum at the top level of a data collection. A key part of this process is identifying and disambiguating named entities and enriching and linking them to persistent identifiers from external authority files wherever possible. We encourage depositors to provide additional external persistent identifiers for named entities and we provide extended functionality for a set of, currently, 27 sources that fulfil some requirements like API access and sustainability. Due to its broad and ever-expanding coverage, and its proven usefulness to curators in finding IDs from other authority files, Wikidata was one of the first items on this list of supported sources (it is available in JSON format on GitHub9).
In practice, if a named entity has a supported external identifier, any missing information can be fetched from that source. This is valuable both for data creators and curators, because it allows the provided metadata to be enriched and normalised, even when manual curation is not possible due to the large number of entities. For example, if we get a place with the ID “Q1741” in the form of the URI “http://www.wikidata.org/entity/Q1741” without any further information, our technical setup is capable of fetching a label, further supported identifiers, and coordinates, which then reveal that behind “Q1741” stands Vienna in Austria. By fetching further identifiers, duplicates in ARCHE are minimised, as another entity with the GeoNames URI “https://sws.geonames.org/2761367/” can then be merged with our newly ingested entity Q1741 (Figure 2).

Figure 2
Example of how metadata enrichment leads to merging entities in ARCHE. First a new collection with a related place is ingested and results in the creation of a new RDF node for a place. Then further information for this node is fetched. Based on a matching identifier with an existing place node, these two can be merged and the identifiers are accumulated.
By fetching labels from a preferred source, a normalisation of the named entities is possible. Yet, this is not straightforward, as especially in the case of historic persons the preferred version of a name or title or a combination varies from authority file to authority file as shown in Table 1 and using a single source is not possible because they all complement each other.
Table 1
Identifiers and preferred labels for Ferdinand I, Holy Roman Emperor.
| IDENTIFIERS AND PREFERRED LABELS FOR FERDINAND I, HOLY ROMAN EMPEROR | |
|---|---|
| IDENTIFIER | PREFERRED LABEL |
| https://id.acdh.oeaw.ac.at/ferdinandIhrr | Ferdinand I., Heiliges Römisches Reich, Kaiser |
| http://id.loc.gov/rwo/agents/n82050986 | Ferdinand I, Holy Roman Emperor, 1503–1564 |
| http://viaf.org/viaf/51698517 | Lists 280 variants |
| http://www.wikidata.org/entity/Q150611 | Ferdinand I, Holy Roman Emperor |
| https://d-nb.info/gnd/1089109938 | Redirect to https://d-nb.info/gnd/118532502 |
| https://d-nb.info/gnd/1089740514 | Redirect to https://d-nb.info/gnd/118532502 |
| https://d-nb.info/gnd/118532502 | Ferdinand I., Heiliges Römisches Reich, Kaiser |
| https://d-nb.info/gnd/1243972378 | Redirect to https://d-nb.info/gnd/118532502 |
| https://isni.org/isni/0000000110266433 | Lists variants |
Another normalisation done on the side of ARCHE with the aforementioned list of supported authority file sources is related to URI normalisation, which is done with rules formalised as regular expressions as shown in Figure 3. This is required to avoid another type of duplicate where e.g. a person was created with the Wikidata URI “https://www.wikidata.org/wiki/Q186709” and later, from another data creator, another person with the URI “http://www.wikidata.org/entity/Q186709” is created. Both URIs refer to the same named entity but in ARCHE disambiguation is done on the level of the URI and without any normalisation these two URIs are interpreted as two distinct entities. An extreme example for an authority file with a large variability of URIs is GeoNames, where one identifier can be resolved with up to 14 different URIs.

Figure 3
Excerpt from the list of supported sources for external identifiers acceptable in ARCHE. Each source is described with a name, a regular expression to match and fetch the actual ID from a given URI, and a normalised URI to replace whatever URI was submitted. Furthermore, to fetch information from these sources, the URI resolving to RDF along with a request format is indicated. The List is available on GitHub: https://github.com/acdh-oeaw/arche-assets/blob/master/AcdhArcheAssets/uriNormRules.json.
In Wikidata, this kind of normalisation is done by using individual properties for each external authority file. For example, the property P156610 is used to attach a GeoNames ID (e.g. 2761369) to an item in Wikidata. With the help of URI templates related to each property, e.g. P163011 (formatter URL) the GeoNames ID can then be displayed as a clickable link pointing to “https://www.geonames.org/2761369”. Furthermore, with the property P896612 (URL match pattern), regular expressions similar to those in Figure 3 can be included to extract an ID from a given URI. This open and easily accessible documentation of the URI templates provides a useful source when new identifier sources shall be added to the list for ARCHE.
(2.2) Case study: Reconciliation of Persons
As ARCHE has developed over time, also the data and metadata curation processes have evolved. At the beginning of ARCHE the number of named entities was manageable and a lot of semi-manual curation was still affordable. Given the increasing amount of new data collections and associated metadata, and the fact that named entities do not require an external identifier, it became necessary to merge duplicates with disjoint links to related resources and to reconcile entities without identifiers. The reasons for a lack of authority file identifiers include the entities being e.g. historical persons where not enough information was available, or a missing awareness for LOD from the data creator side.
For the reconciliation, we opted for Wikidata because it is natively supported in OpenRefine13 (Delpeuch et al., 2025), for which ARCHE also provides a Reconciliation Service API14 endpoint via “https://arche.acdh.oeaw.ac.at/openrefine/reconcile”. As the use of Wikidata in library catalogues is becoming increasingly common (Tharani, 2021; van Veen, 2019), so is the number of named entities stored within it. At the time of the reconciliation, about 2000 persons without any external identifier in ARCHE were found in Wikidata. Still, about 4500 persons remained without any match. For some this is due to misspellings (e.g. extra space between first and second name), used abbreviations (e.g. “Eisenst.” or “Rose Fr.”), and ambiguous names lacking further information to accurately determine their respective authority identifier (e.g. “Strauss”).
Yet, there is a large portion of persons whose authority files simply do not exist or cannot be reliably determined, so the question of how to manage them remains. Should these be kept as individual instances of the class Person according to the ARCHE schema, should they be kept as a text value within a new datatype property or should they be omitted?
It seems that there is no one solution that fits all, but that it has to be decided on a case to case basis. For example, identifiable persons with rich additional metadata could be entered into Wikidata, if they currently do not exist there yet. However, because ARCHE is a digital archive with all the tasks attached to the mission of long-term preservation of research data (Trognitz et al., 2024) and not an authority file service nor a database, the resources for curating information about named entities are scarce. Thus, we rely on the provision of authority identifiers and rich metadata by the data creators. However, as the Data Policy “pursues an aggressive Open Access strategy in the publication of research results” (ACDH n.d.), researchers are advised and supported in re-using authority files and actively contributing to open accessible services, such as Wikidata. In this way we try to encourage the use, enhancement and provision of Linked Open Data.
(3) Controlled vocabularies for ARCHE
As mentioned in the previous section, ARCHE’s Metadata schema has some object properties that are used to include terms from controlled vocabularies. These vocabularies are vital to ensuring the consistency and interoperability of metadata within ARCHE. They enable us to include uniform and multilingual (German and English as a minimum) values across diverse collections for properties denoting a resource’s access restriction, license, lifecycle status, type category, originating research discipline, or language.
The vocabularies themselves are hosted on the Skosmos-based (Suominen et al., 2015) Vocabs Service15 of the ACDH (Zaytseva & Ďurčo, 2020) and formalised in SKOS (Isaac & Summers, 2009). Each vocabulary specifies the permitted values and their semantic relationships. Each term includes a human-readable label, a machine-readable identifier (URI), a definition of the term, and links to external concepts, e.g. Wikidata items.
Wikidata increasingly plays an important role in the creation and maintenance of ARCHE’s controlled vocabularies, serving both as a reconciliation service and as a source of authoritative identifiers. The following two case studies illustrate how this approach is applied in practice: the language code vocabulary, which ensures the consistent use of labels for ISO language codes, and the subject vocabulary, which is planned to support thematic classification across ARCHE’s diverse humanities collections.
(3.1) Case study: language code vocabulary
To denote the language of a resource, we chose to use the three-letter ISO 639-3 language codes (SIL Global, n.d.) because these cover living and extinct languages that might be submitted within a research data collection to ARCHE. For these codes, a SKOS vocabulary was not readily available in 2019. While Lexvo.org posed an alternative source for language information following Linked Data principles (De Melo, 2015), ARCHE’s software setup at that time required the list to be formalised in SKOS (Isaac & Summers, 2009). Thus, a list available from IANA was imported into OpenRefine and reconciled with Wikidata. Links to the items in Wikidata are included in the vocabulary with owl:sameAs and links to the already existing SKOS vocabulary with ISO 639-116 language codes were included via skos:exactMatch (Figure 4). In the course of the preparation, 18 entries identified as missing in Wikidata were created17 via the QuickStatements tool (Wikidata contributors, 2025), which can be conveniently used via OpenRefine. The resulting vocabulary is now available at https://vocabs.acdh.oeaw.ac.at/iso6393/. It will require updating to reflect changes to the ISO 639-3 language code set over the past five years. The workflow will then include both using and contributing to wikidata.

Figure 4
SKOS vocabulary for ISO 639-3 language codes in the ACDH Vocabs Service. Links to equivalent and exactly matching concepts are included as well.
(3.2) Case study: subject vocabulary
To aid in better describing and finding the various datasets from heterogeneous humanities disciplines covering a diverse range of subjects, which are difficult to foresee, the ARCHE schema currently has a data type property acdh:hasSubject that allows a free text value. Since ARCHE plans to use this property as a facet in the search interface, a consolidation, normalisation, reconciliation, and translation (German to English or vice versa) of all the included values is necessary. The goal for the Subject Vocabulary is to create a structured set of subject terms identifying what a resource or collection is about. By mapping the subject terms supplied by depositors to a controlled list, we aim to create a normalised and interoperable set of descriptors.
There are two potential strategies for achieving this that are currently under consideration. One approach would be a high-level controlled vocabulary in which broad, general categories could be applied and used to support faceted browsing and discovery. Alternatively, a more detailed, fine-grained, yet open vocabulary could be implemented. In this approach, the subject terms provided by the depositors would be mapped directly to external authority files and vocabularies such as Wikidata. While this approach would allow the vocabulary to grow dynamically and contain all contributions supplied by depositors, on the other hand, this set of terms would likely not be suitable for a faceted search due to a lack of structure.
Perhaps the most effective solution will be a hybrid approach. By employing a high-level vocabulary for general categorisation, we would not be giving up the chance for implementing another useful access point, while still allowing for richer subject terminology by utilising links to external authorities. Wikidata could play a valuable role here, as most items are defined as instances of a particular subject class (https://www.wikidata.org/wiki/Property:P31). This would help with grouping subjects across the reconciled terms. The main challenge would be drawing a line between broad categories, which would likely require experimentation and user feedback. As this approach would combine usability with extensible, semantic richness, ARCHE would be taking advantage of both approaches, enhancing discoverability while remaining connected to the wider linked data landscape.
(4) Metadata beyond ARCHE
To increase the findability and reusability of the data entrusted to ARCHE, the metadata is shared via a public endpoint18 that complies with the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH, Lagoze et al., 2002). Multiple aggregators are already harvesting ARCHE metadata. The aggregators include research infrastructure portals in which ARCHE staff have been and continue to be involved, thereby gaining insight into the post-processing of metadata on the portals’ side. A selection of three cases will be presented: Kulturpool, CLARIN, and ARIADNE.
(4.1) Kulturpool
Kulturpool19 serves as Austria’s central search portal and competence centre for digitised cultural heritage. It aggregates and presents digital objects from a wide range of museums, libraries, and archives. It forms part of the national strategy “Kulturerbe Digital”, initiated by the Federal Ministry for Housing, Arts, Culture, Media and Sport (BMWKMS) and implemented at the Natural History Museum Vienna (NHMW).
As a national cross-domain aggregator, Kulturpool functions as an intermediary layer between local datasets and the wider landscape of European aggregation, particularly Europeana.20 Its partner institutions vary widely in technical capacity, descriptive standards, and metadata granularity and include institutions that operate without formal metadata management systems or controlled vocabularies. Consequently, the incoming datasets are heterogeneous in format, quality, and conceptual modelling.
Contributing institutions are not required to provide controlled vocabularies or fully standardised metadata. This open approach reflects Kulturpool’s commitment to inclusivity: institutions of all sizes and levels of technical maturity can participate, provided their data meet a minimal interoperability threshold. By prioritising accessibility over strict standardisation, Kulturpool accommodates the diversity of Austria’s cultural heritage sector.
A persistent issue across aggregated data is the inconsistent treatment of named entities – people, places, organisations, materials, and concepts. The same person may appear under multiple spellings or variants, places are often entered without geocoordinates, and institutions often lack persistent identifiers altogether. These discrepancies hinder semantic interoperability and limit the capacity to interconnect data across collections.
For Kulturpool, reconciling these inconsistencies is not only a matter of improving search precision but also of enabling a more coherent representation of Austrian cultural heritage as a whole. The effort focuses on aligning diverse institutional data with established global reference systems while respecting the autonomy and descriptive practices of the contributing partners.
For Kulturpool, Wikidata’s openness and inclusivity are central advantages. Its global, multilingual structure allows both scholarly and vernacular knowledge to coexist, which is especially relevant for cultural heritage data that bridges academic and community contexts. This pluralistic framework aligns with Kulturpool’s mission to represent the diversity of Austria’s cultural institutions within a shared semantic environment.
Kulturpool is currently developing a workflow that uses Wikidata as a central anchor for metadata harmonisation. Kulturpool employs an internal OpenSearch21 instance to index a locally extracted subset of the Wikidata dataset derived from regular dump files. The use of a local index provides several advantages. It allows large-scale operations without external API limits or latency, supports tailored scoring mechanisms for domain-specific patterns, and preserves data sovereignty within Kulturpool’s infrastructure.
Within this workflow, each metadata field – such as creator, material, or subject – is reconciled against the corresponding Wikidata entity types. Restricting matches to relevant types improves precision and reduces the likelihood of false positives. Candidate entities are ranked based on similarity scores computed from labels, aliases, and other identifying properties. Low-confidence matches are flagged for manual verification, allowing curators to confirm, correct, or reject suggestions before identifiers are stored.
The long-term goal of reconciliation extends beyond standardising identifiers. By linking named entities across datasets, Kulturpool can enrich and contextualise metadata with biographical information, images, dates, descriptions, and more. This enables users to navigate collections through semantic relationships, for example by connecting works held by multiple institutions or by tracking distribution patterns of artefacts made of similar materials. Furthermore, the reconciliation infrastructure enables Kulturpool to support participating institutions by returning enriched data that include Wikidata identifiers and normalised entity names.
The Kulturpool initiative demonstrates how Wikidata, though not a traditional authoritative source, can unify heterogeneous information under shared identifiers and provide a common semantic language across institutions. It also highlights disparities in resources: while large museums operate within established frameworks, smaller entities rely on shared infrastructures like Kulturpool, which offers reconciliation and enrichment services to promote interoperability and wider access to Linked Open Data.
(4.2) CLARIN
The Common Language Resources and Technology Infrastructure (CLARIN)22 is an European Research Infrastructure Consortium dedicated to language data and tools. CLARIN offers a broad range of interoperable services that serve CLARIN’s Open Science agenda and adhere to the FAIR data principles. One core component for the semantic interoperability layer of CLARIN is the CLARIN Concept Registry (CCR),23 which is relevant for serving interoperable metadata, e.g. following the Component MetaData Infrastructure (CMDI) schema (Ďurčo & Windhouwer, 2014). The CCR is a controlled vocabulary in SKOS format with a collection of concepts relevant to language resources. Originally, this vocabulary was conceived as the ISO TC 37 Data Category Registry (DCR) called ISOcat (Snijders et al., 2009). ISOcat is recently being abandoned in favour of Wikidata, which required some mapping effort resulting in 74% (693 out of 933) of suitable mappings for the entries. As soon as the mapping is completed, this change will also become visible in the SKOS vocabulary (Uytvanck, 2025, pp. 8–17).
(4.3) ARIADNE
The ARIADNE Research Infrastructure24 (ARIADNE RI) operates an Open Access data catalogue, the ARIADNE portal,25 which aggregates archaeology and heritage resources from over 40 countries across four continents.
The resources in the ARIADNE portal are described using the AO-Cat Ontology (Felicetti et al., 2025), which incorporates the subject concepts of the Getty Art and Architecture Thesaurus (Getty AAT, 2017) for ARIADNE subject keywords. As the subject keywords should be available in multiple languages, the use of Wikidata was proposed because adding more labels in multiple languages is quicker and easier than adding them to the Getty AAT. This sparked a discussion on why ARIADNE uses the Getty AAT and not Wikidata as its main reference vocabulary, which culminated in a short feature stating that the Getty AAT was chosen for being a widely-adopted archaeological standard with high-quality, curated terminology and stable linked data implementation. While Wikidata offers advantages like flexibility, openness, and broader scope, it is more general-purpose, subject to frequent changes, and was not designed as a primary source for specialised domains. However, ARIADNE supports using Wikidata in the provided metadata as the AO-Cat Ontology allows to include links from other sources which are used in the indexing mechanisms of the ARIADNE Portal (Basset et al., 2020).
(5) Conclusion
With the presented case studies, ranging from reconciliation, over vocabulary collation, to metadata interlinking, we illustrated the role of Wikidata as a practical anchor for metadata harmonisation and enrichment across national and international research data repositories, thematic portals, and research infrastructures.
The main strengths of Wikidata lie in its open CC0 licence, multilinguality, flexible data model and broad scope of topics, and its open-minded and active community. Unlike traditional authoritative sources, which are developed and curated by domain specialists, with quality and authority in mind, Wikidata was never intended to be a primary source. Instead, it functions as a secondary database open to anyone for contribution and consumption. In this way, it can be regarded as a democratic “authority file” that unifies heterogeneous information under shared identifiers, providing a common semantic language across disciplines, platforms, and institutions.
Anyone can edit any item on Wikidata at any time, which, through collective improvement over time, can result in high-quality items (Shenoy et al., 2022). However, it requires domain knowledge to assess whether this already applies to items of interest, which is why proper reconciliation across broad areas of disciplines — as illustrated in our case studies of person reconciliation (Section 2.2) and subject vocabulary development (Section 3.2) — requires careful curation and validation. The quality and suitability of Wikidata depends not only on the individual items, but also on the respective use cases (Zhao, 2023). This explains why some of the presented case studies, such as the language code vocabulary (Section 3.1), could be completed efficiently, while others, like the broad humanities subject vocabulary (Section 3.2), remain ongoing projects requiring iterative refinement.
Despite the variety of aggregators discussed in Section 4 — Kulturpool, CLARIN, and ARIADNE — they are united by their recognition and commitment to leveraging the potential of Wikidata as a valuable source of linked open data. The Kulturpool initiative, in particular, highlights what can be achieved through systematic integration: new modes of navigation and discovery, such as visual exploration of portraits or timeline-based browsing of biographical events, thereby enhancing both analysis and user engagement.
Like most humanities projects (Zhao, 2023), the presented case studies primarily consume data from Wikidata, offering the opportunity to contribute back to the community. As Zhao (2023) acknowledged, expert scholarly practitioners and their projects are a valuable source of high-quality external references that enhance Wikidata’s quality and coverage. Contributing to Wikidata is straightforward: the same tools used for reconciliation, OpenRefine with its Wikidata reconciliation service and QuickStatements for batch uploads, facilitate efficient contribution workflows. The modest contribution of 18 missing language codes (Section 3.1), demonstrates how seamlessly contribution can be integrated into existing curation processes.
Looking forward, the question is not whether to use Wikidata, but how to integrate it more systematically into research data infrastructures while cultivating a beneficial relationship of consumption and contribution. By viewing Wikidata as a shared and participative infrastructure rather than merely a data source, we can ensure that it continues to grow in quality, coverage, and utility for everyone.
Notes
[1] https://arche.acdh.oeaw.ac.at/browser/ (last accessed: 30 October 2025).
[2] https://www.geonames.org/ (last accessed: 30 October 2025).
[3] https://d-nb.info/standards/elementset/gnd (last accessed: 30 October 2025).
[4] https://viaf.org/en (last accessed: 30 October 2025).
[5] https://perio.do/en/ (last accessed: 30 October 2025).
[6] https://www.wikidata.org/w/index.php?title=Help:Data_type&oldid=2314686648#External_identifier (last accessed: 30 October 2025).
[7] https://www.wikidata.org/ (last accessed: 30 October 2025).
[8] https://github.com/acdh-oeaw/arche-schema; tabular view: https://acdh-oeaw.github.io/arche-schema/; graph view: https://service.tib.eu/webvowl/#iri=https%3A%2F%2Fraw.githubusercontent.com%2Facdh-oeaw%2Frepo-schema%2Fmaster%2Facdh-schema.owl (all last accessed: 21 November 2025).
[9] https://github.com/acdh-oeaw/arche-assets/blob/master/AcdhArcheAssets/uriNormRules.json (last accessed: 30 October 2025).
[10] https://www.wikidata.org/wiki/Property:P1566 (last accessed: 30 October 2025).
[11] https://www.wikidata.org/wiki/Property:P1630 (last accessed: 30 October 2025).
[12] https://www.wikidata.org/wiki/Property:P8966 (last accessed: 30 October 2025).
[13] https://openrefine.org/ (last accessed: 30 October 2025).
[14] https://reconciliation-api.github.io/specs/latest/ (last accessed: 30 October 2025).
[15] https://vocabs.acdh.oeaw.ac.at/en/ (last accessed: 30 October 2025).
[16] https://vocabs.acdh.oeaw.ac.at/iso6391/Schema (last accessed: 30 October 2025).
[17] https://editgroups.toolforge.org/b/OR/29a9c75c/ (last accessed: 30 October 2025).
[18] https://arche.acdh.oeaw.ac.at/browser/api-access#oai-pmh (last accessed: 30 October 2025).
[19] https://kulturpool.at/ (last accessed: 15 December 2025).
[20] https://www.europeana.eu/ (last accessed: 15 December 2025).
[21] https://opensearch.org/ (last accessed: 30 October 2025).
[22] https://www.clarin.eu/ (last accessed: 30 October 2025).
[23] https://www.clarin.eu/conceptregistry/ (last accessed: 30 October 2025).
[24] https://www.ariadne-research-infrastructure.eu/ (last accessed: 30 October 2025).
[25] https://www.ariadne-research-infrastructure.eu/portal/ (last accessed: 30 October 2025).
Competing Interests
The authors have no competing interests to declare.
Author contributions
Writing – original draft: Martina Trognitz, Rachel Alyson Mandell, Seta Štuhec, Julian Palacz.
Writing – review & editing: Martina Trognitz, Rachel Alyson Mandell, Seta Štuhec.
