(1) Context and Motivation
(1.1) Samian Ware and the Archaeological Background
The Linked Open Samian Ware project is derived from the specialised archaeological primary database, Samian Research,1 which comprises 250,000 catalogued Roman potters’ stamps found at ca. 4,400 sites throughout the Roman Empire. This material was mass produced from the 1st to the 3rd century, and its exportation covered the whole Roman Empire, the so-called Barbaricum, and products can be found even as far as India (Figure 1).

Figure 1
Distribution of Roman Samian Ware (terra sigillata) manufactured in production centres in the western Roman Empire, using Samian Research data. Allard W. Mees, CC BY 4.0.
Citizen Science contributors participate in the identification of potters’ stamps by comparing original stamped Samian vessels or photographs or rubbings of the stamps involved. The identified character strings are compared against the reference corpus provided by the Samian Research database, thus enabling scalable data enrichment.
The time-consuming identification work and the entry of relevant information into the database are not restricted to academic researchers but are supported by a tangible Citizen Science community. The LEIZA (Leibniz-Zentrum für Archäologie, formerly known as RGZM) acts as the coordinating institution and organises data curation workshops across Europe for the Samian academic research community and the Citizen Science community. Participation is free of charge. LEIZA’s review procedures do not limit the quality of the entered data. It is ensured by the win-win situation in which the users (both scientists and the Citizen Science community) find themselves when they wish to use the complementary set of analytical tools offered for creating maps or statistical charts on the Samian Research website.
Since the material recorded in this resource was found throughout the Roman Empire, the georeferenced data gives deep insights into the economic history of the Roman Empire. The distribution of Samian Ware has long been recognised as a key proxy for reconstructing Roman trade networks (De Soto et al., 2025), exchange mechanisms, and market integration, particularly through pottery-based distribution studies (Flückiger et al., 2022; Wilson & Bowman, 2018; Franconi et al., 2023). Ceramic evidence is widely used to analyse connectivity, market structures, and the organisation of production, technological innovation and distribution within the Roman economy (Greene, 2000). Aspects of Roman commerce, such as least-cost routing, directed trading, and differential marketing, can be analysed using this dataset.
Differential marketing is an economic principle in which goods are mainly marketed in specific regions for economic reasons (Mees, 2024b, p. 296). A remarkable variant of Samian Ware was called marbled Samian, in which the engobe was a mixture of yellowish and reddish marbled slips. Since it was technically very challenging to make this kind of slip on vessels, it would also have produced a considerable number of product rejects that could not have been sold. Therefore, it is considered a costly product. In the southern markets of the Roman Empire, this product variety yielded relatively high market percentages of up to 10% (Martin, 1985, 1991), whereas in the northern markets it remained a vessel variant that was only rarely found in a markets predominantly flooded with red slip-coated Samian Ware vessels (Figure 2).

Figure 2
Differential marketing of marbled Samian during the 1st century AD. Allard W. Mees, CC BY 4.0.
Directional trade is an economic principle in which goods are marketed towards a specific part of a supply chain (Zeitlin, 1982). The analysis of the marketing of Samian Ware has shown that most of it was sold en route to civil places like London, Augst, Colchester, Lyon and Narbonne (Figure 3). In contrast, the military Limes camps at the end of the supply chain were relatively marginally supplied with Samian Ware despite the military itself being an attractive commercial target due to its regular military salary payment (Reddé & Mees, 2022, p. 68).

Figure 3
Directed trading focused on the civil Roman towns en route towards the Roman military camps during the 1st century AD. Export distances of Samian produced at La Graufesenque and percentages of the total export per site. On the horizontal x-axis, the distance in kilometres is displayed, and on the vertical y-axis, the export percentages are shown. Allard Mees, CC BY 4.0.
A least-cost routing analysis found that Samian Ware en route could take detours of hundreds of kilometres to save on transport costs. Since river transport costs were, according to the Roman Egyptian papyri, at least 5 times cheaper than land transport, the Samian Ware produced at Banassac in southern Gaul and intended to be sold at Carnuntum on the middle Danube took a considerable detour. Upon arriving in the territory of Germania Superior, it went downstream along the Rhine and, at Heidelberg, turned upstream along the Neckar. In the upper reaches of the Neckar River, it still had only to travel a short distance overland to reach the Danube (Mees, 2024a, pp. 258–259). Although the route straight through the Black Forest towards the Danube would have been geographically much shorter, the route passing modern Heidelberg was river bound and therefore considerably cheaper (Figure 4). Since the distribution of Banassac Samian Ware is found only very occasionally north of Heidelberg, its distribution can be explained by reasons of least transport cost.

Figure 4
Least cost routing of Samian Ware from Banassac to Carnuntum. Allard Mees, CC BY 4.0.
(1.2) Samian Ware within the Federated Knowledge Graph Ecosystem
The Linked Open Samian Ware (LOSW) initiative operates within the Federated Knowledge Graph Ecosystem (FKGE) that aims to interconnect disciplinary datasets, community-maintained repositories, and institutional infrastructures through shared semantic standards (Figure 5). This federated design avoids data centralisation by promoting interoperability between decentralised, semantically aligned repositories (Fischer et al., 2025). The FKGE comprises three components: (1) LOD Triplestores (Berners-Lee et al., 2001) and Solid Pods (Sambra et al., 2016), (2) Wikibase instances (Vrandecic, 2013; Vrandečić & Krötzsch, 2014), and (3) FAIR Digital Objects (FDOs) (De Smedt et al., 2020; Schwardmann, 2020).

Figure 5
Scheme of the Federated Knowledge Graph Ecosystem. Florian Thiery and Andreas Noback, CC BY 4.0.
In the case of Samian Research, three core data layers participate in this ecosystem: (i) the domain-specific Linked Open Data (LOD) graph hosted via archaeology.link, (ii) the community-based Wikidata representation, and (iii) the fuzzy-sl Wikibase dedicated to modelling spatial uncertainty and provenance. These layers are semantically harmonised through, on the one hand, (j) the CIDOC Conceptual Reference Model (CIDOC CRM) for semantic interoperability within the cultural heritage domain (Bekiari et al., 2024) and, on the other hand, the (jj) Object Core Metadata Profile, based on the German National Research Data Infrastructure (NFDI) Core Metadata Profile which makes crosswalks of DCAT, schema.org, and DataCíte, as well as the (jjj) Material Cultural Heritage Crosswalk Ontology that crosswalks CIDOC CRM to other related ontologies such as PROV-O,2 BFO,3 NFDIcore Ontology4 (Tietz et al., 2025; Steller et al., 2025) and the Archaeo-Natural Ontology ArNO5 (thor Straten et al., 2025; thor Straten & Thiery, 2025), ensuring conceptual consistency and traceable data lineage (Thiery, Gerber, et al., 2025; Mempel-Länger et al., 2025). At the infrastructure level, the Samian corpus is integrated into the NFDI4Objects Knowledge Graph (N4O-KG), which serves as a domain graph within the archaeology and cultural heritage-related consortium NFDI4Objects (Thiery, Mees, Weisser, et al., 2023) within the NFDI (Hartl et al., 2021). The N4O-KG (Voß et al., 2024) builds on the Knowledge Graph Infrastructure for NFDI (KGI4NFDI), which is based on a shared ontology and a persistent-URI management system that connects heterogeneous humanities and cultural heritage datasets. Through this integration, LOSW acts not merely as an isolated data publication but as a reusable, queryable component within a distributed digital research landscape, accessible through federated SPARQL endpoints. Beyond the data itself, the federated ecosystem encompasses FDOs, persistent, machine-actionable entities that encapsulate not only data but also, e.g., software and measurement artefacts. Within the Samian framework, these FDOs represent the executable and instrumental components that underpin the data transformation and analysis process; yet the FDO creation is still a work in progress. They include:
Python FAIRification scripts (Thiery, Mees, Gottwald, et al., 2023), which convert CSV exports from the Samian Research database into RDF according to the Linked Open Samian Ware ontology. Each script is versioned, published on GitHub and Zenodo, and carries metadata about its dependencies, runtime, and authorship, thereby ensuring complete transparency and reproducibility.
Jupyter Python Minions (Thiery, 2024, 2025) are small, modular notebooks that execute SPARQL queries, analyse Wikidata entries, and visualise Samian distribution patterns within interactive environments. They operationalise FAIR4RS principles (Barker et al., 2022) by providing executable workflows that connect structured data and computational interpretation.
ColdFusion and JavaScript modules within the original Samian Research database analysis toolbox, which perform statistical aggregations and dynamic web-based visualisations.
Measurement data from Reflectance Transformation Imaging (RTI) and the Laser Aided Profiler, which capture the physical characteristics of potter’s stamps and vessel form profiles. These datasets serve as the empirical foundation of the digital corpus. They are referenced via persistent identifiers, ensuring that derived analytical data can always be traced back to their sources.
Together, these FDOs are a binary backbone of the Samian ecosystem. They link how data are produced, transformed and interpreted with what the data represent in semantic terms. By combining data, software, and measurement artefacts under a shared FAIR framework, the Samian project exemplifies how archaeological knowledge creation can be made both reproducible and interoperable.
Technically, the FKGE is sustained by persistent identifier services, cross-referenced ontologies, and synchronised RDF endpoints that enable bidirectional discovery between LOD datasets, Wikibase instances, and FDO registries. The result is a distributed coherent digital research space in which each layer, from data curation to software execution, is citable, queryable, and reusable.
This combination of semantic alignment, open-source tooling, and FDO integration underpins the methodological approach outlined in the following chapters. It ensures that the LOSW corpus is not only a static publication of archaeological data, but an active, evolving component of a global knowledge infrastructure dedicated to open and FAIR archaeology.
(2) Dataset description
Repository location
https://doi.org/10.5281/zenodo.4305708 (Linked Open Data); https://www.wikidata.org/wiki/Wikidata:WikiProject_Linked_Open_Samian_Ware (Wikidata); https://fuzzy-sl.wikibase.cloud/wiki/Project_SamianResearch (fuzzy-sl Wikibase)
Repository name
Zenodo; Wikidata; fuzzy-sl Wikibase
Object name
Linked Open Samian Ware
Format names and versions
RDF (Linked Open Data); Wikidata and fuzzy-sl entries.
Creation dates
From 2020-12-04 (Linked Open Data) to now; from 2020-12-01 (Wikidata) to now; from 2024-10-23 (fuzzy-sl Wikibase) to now.
Dataset creators
Brian Hartley – University of Leeds (data curation), Brenda Dickinson – University of Leeds (data curation), Geoffrey Dannell – Nottingham University (data curation), Philip Kenrick – Oxford University (data curation), Samian Research Community (data curation), Dennis Gottwald – Johannes Gutenberg University Mainz (Linked Open Data), Florian Thiery – Leibniz-Zentrum für Archäologie (Linked Open Data; Wikidata, fuzzy-sl Wikibase), Allard W. Mees – Leibniz-Zentrum für Archäologie (Linked Open Data; data curation, Wikidata)
Language
English
License
Samian Research m-DPPL (Samian Research); CC0 (Wikidata); https://www1.rgzm.de/ips/Licensing/DPPL/DPPLLicenseSamian_English.html (Linked Open Data); CC0 (Wikidata); CC BY 4.0 (fuzzy-sl Wikibase)
Publication date
Initial release on GitHub and Zenodo 2020-12-04, last release on GitHub and Zenodo 2024-09-24 (Linked Open Samian Ware); Initial release on Wikidata 2020-12-01; Initial release on the fuzzy-sl Wikibase 2024-10-23.
(3) Method
(3.1) Conceptual approach and data transformation workflow
The methodological strategy builds on a staged transformation from a primary database to a knowledge graph. Samian Research provides the curated, domain-specific corpus of potters’ stamps, vessel form types and associated contexts. This corpus is FAIRified (Wilkinson et al., 2016) by exporting core entities and relations into the Resource Description Framework (RDF), using a domain ontology aligned with CIDOC CRM to ensure semantic compatibility with cultural-heritage infrastructures. From there, selected entities are integrated into Wikidata as a secondary database and community hub, where they gain global identifiers, multilingual labels, and cross-project links. A complementary fuzzy-sl Wikibase (Thiery et al., 2024) is used when point coordinates, or simple assertions, cannot adequately represent spatial uncertainty or contextual ambiguity. Operationally, the LOSW workflow (Figure 6) proceeds as follows: (i) extract and normalise records from the primary database; (ii) generate persistent URIs and transform records into RDF according to the Samian ontology; (iii) publish the LOD, documentation and code with versioning; (iv) create and maintain Wikidata items for discovery sites, production centres and kiln regions with explicit back-links to the LOD URIs; (v) model uncertain or composite locations in fuzzy-sl, linking them back to both Wikidata and the LOD layer (Middle et al., 2025; Schmidt et al., 2022; Thiery, Rossenova, et al., 2025; Thiery & Thiery, 2023). Throughout, the focus is on interoperability rather than centralisation: each layer remains authoritative for its purpose, and alignment is achieved through shared identifiers, explicit mappings, and reproducible scripts.

Figure 6
LOSW Linked Pipe as Data Flow Diagram. Florian Thiery, Timo Homburg, Martina Trognitz, CC BY 4.0.
(3.2) Linked Open Data as RDF
The LOSW graph implements a part of the Linked Archaeological Data Ontology (LADO) aligned with CIDOC CRM (Thiery, Mees, Gottwald, et al., 2023; Thiery & Mees, 2024). Three types structure the corpus: (i) information carriers (e.g., vessel sherds having stamps), (ii) places (e.g., discovery site, production centre), and (iii) actors (e.g., potters, pottery lessors). Vessels serve as information carriers, which carry inscriptions representing an actor entity, e.g., a single potter, a lessor or a potter company. Such information carriers were discovered at a specific site and originally produced by an actor entity in a determined kiln site. The RDF layer assigns citable HTTP URIs to each entity and relation. Core relations include links from a single object to its inscription (and reading), to its vessel form type, to its discovery site, to its production centre(s), and to actor entities at production centre(s). Some of the relations are modelled using “AND,” “OR” or “?” strings to represent uncertainty, vagueness, and ambiguities (Thiery et al., 2022). This is modelled using the Academic Meta Tool (AMT) ontology (Thiery & Mees, 2023; Unold et al., 2019), seen in Figure 7 as vague Dragendorff vessel form type attributions.6 Spatial information is provided as GeoSPARQL geometries and, in many cases, complemented by references to external gazetteers, such as Pleiades, for ancient toponyms. The ontology file specifies classes, properties, and expected value types; Python scripts in the repository implement the extraction, transformation, and URI-minting logic, producing publishable RDF dumps and human-readable documentation pages. This LOD layer constitutes the canonical, citable representation of the Samian corpus and is designed for federation with external triplestores, including knowledge graphs maintained within research infrastructure contexts.

Figure 7
A: Schematic representation of AMT modelling. B: Samian vessel form type string “15/17 OR 18 OR 18/31” according to the Dragendorff form type catalogue as RDF and SPARQL. Florian Thiery, Allard W. Mees, and Dennis Gottwald, CC BY 4.0.
(3.3) Wikidata entries and semantic integration
Wikidata is used as a linking hub and secondary database to expose Samian entities to a broader, multilingual community and to interconnect them with adjacent resources (Schmidt et al., 2022). Three item types are central to this integration: discovery sites, production centres, and kiln regions. Each Wikidata item is typed via “instance of” (wd:P31) and connected to the Samian LOD URI via “exact match” (wd:P2888), ensuring unambiguous correspondence. Coordinate locations are added where appropriate, and part – whole relations group items into a single project. Selected items exemplifying this structure include a production centre, a discovery site, and a kiln region that mirror the three principal classes of the LOD graph. Wikidata’s statement model (Figure 8) supports qualifiers and references.

Figure 8
A labelled graphic showing the organisation of information on Wikidata with the Decibalus suicide Samian vessel. Florian Thiery & Allard W. Mees, CC BY 4.0, via Wikimedia Commons.7
These aspects can be demonstrated by a decorated Samian vessel (Q137392997; Vernhet, 1981) made by the potter L. Cosius, which was found at the Gaulish production centre of La Graufesenque (Q677814). The vessel depicts the suicide of the Dacian king Decibalus and mentions the honorary title Parthicus of the Roman emperor Trajan (Figure 9). This implies several inceptions: the general range of the working period of the potter (AD 90–125), the historical date of the suicide of the Dacian king (AD 106), and the awarding year of the title Parthicus (AD 116) (Cassius Dio, 1925).

Figure 9
Rubbing of a decorated Samian vessel depicting the suicide of the Dacian king Decibalus during the reign of Trajan. Allard W. Mees, CC BY 4.0, via Wikimedia Commons.8
Items are maintained under the WikiProject Linked Open Samian Ware (Mees & Thiery, 2022), which documents modelling conventions, preferred properties and reconciliation practices. Data creation and updates follow reproducible routines as bidirectional alignment: Wikidata items carry explicit links to the LOD graph, and the LOD documentation resolves back to the corresponding Wikidata identifiers.
(3.4) Fuzzy Spatial Locations Wikibase
The integration of the fuzzy-sl Wikibase (Thiery et al., 2024; Thiery, Schenk, et al., 2025) addresses a specific spatial challenge that arises when linking archaeological data to external gazetteers such as Pleiades9 or GeoNames10. Samian Research usually records discovery sites with modern generic geographic coordinates that reflect where objects were actually found. This ignores existent differences between the location of, e.g., the Roman or Medieval phases within a town like Budapest. This raises the problem of place as concept and location as physical site. The fuzzy-sl Wikibase resolves this discrepancy by providing an intermediate spatial layer that explicitly represents coordinates, precision, and provenance of geospatial information (Figure 10). Each fuzzy-sl entry corresponds to a site-level entity derived from the Samian LOD graph and is modelled as an fsl Q-Entity linked to a georeferencing event (fsl:P4 has coordinate; fsl:P7 method used) performed by an acting person fsl:P14. The Wikidata data model extends CIDOC CRM and PROV-O with properties such as fsl:P25 (precision), fsl:P5 (certainty level) and fsl:P13 (certainty description). This structure allows the source and degree of accuracy for every coordinate to be explicitly recorded. For example, the item representing London/Londinium (fsl:Q101) refers to the general Pleiades place of London. At the same time, specific findspots (fsl:Q95: No. 1 Poultry and fsl:Q103: Shadwell/Docks) denote individual Samian discovery sites with different spatial origins, one derived from a Historic England listing, the other from an OpenStreetMap way. By encoding such relationships, the fuzzy-sl Wikibase creates a transparent provenance chain for geospatial data. In practice, this system acts as a semantic bridge between the abstract place concepts used in gazetteers and the empirical coordinates recorded in archaeological databases. It preserves the integrity of the Samian LOD model while adding a layer of spatial metadata that quantifies precision and uncertainty. Each fuzzy-sl item is bidirectionally linked to its corresponding Wikidata and LOD entries via owl:sameAs and wd:P2888 exact matches, enabling federated SPARQL queries that combine conceptual and spatial levels of description. The result is a flexible, interoperable representation of findspot geographies that supports both human interpretation and machine processing across the broader Knowledge Graph ecosystem.

Figure 10
Schematic data model of the fuzzy-sl Wikibase. Florian Thiery, CC BY 4.0.
(3.5) Methodological reflection: potentials and limitations
The combined use of the LOSW, Wikidata and fuzzy-sl Wikibase constitutes a multilayer approach to semantic interoperability in archaeological research. Each layer fulfils a distinct function while contributing to a shared federated infrastructure. The LOD graph provides an authoritative, CIDOC-conformant representation of the dataset, anchoring domain knowledge in well-established ontologies. Wikidata serves as a linking hub that extends visibility and interconnectivity, embedding the dataset in a global semantic network with community governance and multilingual access. The fuzzy-sl Wikibase adds a specialised layer for geospatial precision and provenance, bridging the gap between conceptual place data and empirical findspot coordinates. This architecture demonstrates several potentials. First, it supports transparent, FAIR-compliant data publication with machine-actionable links between curated and community-maintained resources. Second, it enables semantic reconciliation between heterogeneous data cultures from relational archives to collaborative knowledge graphs. Third, it provides a blueprint for integrating uncertainty and context into digital archaeological workflows, a prerequisite for credible spatial analysis in the humanities. At the same time, limitations persist: Wikidata’s open model introduces variability in data quality and schema consistency; the fuzzy-sl extension requires specialist maintenance and ongoing alignment with standard ontologies; and the overall federated approach demands technical infrastructure for long-term synchronisation and versioning. Despite these challenges, the method demonstrates a viable approach for connecting specialist archaeological datasets to global semantic networks without compromising domain integrity. By making data linkable, traceable, and context-aware, the Samian approach turns Wikidata and fuzzy-sl from passive repositories into active instruments for knowledge integration and reuse. This reflective framework provides the foundation for the results presented in the following section.
(4) Results & Discussion
The results of the LOSW integration highlight the semantic and community processes that enable interoperability among a specialised archaeological dataset, global LOD and Wikidata resources, and domain-specific extensions. Building upon the methodological framework described above, this section presents the key outcomes of ontology implementation, data publication, and semantic alignment across the three main layers: LOSW, Wikidata, and the fuzzy-sl Wikibase.
(4.1) Ontology implementation and graph structure
The LOSW ontology as part of LADO (Thiery & Mees, 2024) implements a lightweight extension of CIDOC CRM (crm) and PROV-O, forming the conceptual backbone of the dataset. The ontology defines the principal classes (Figure 11) e.g., Information Carrier, Inscription, Potform, Actor, Place, DiscoverySite, ProductionCentre, and KilnRegion (Thiery & Mees, 2020). lado:DiscoverySite corresponds to crm:E53 Place and is linked via lado:disclosedAt to lado:InformationCarrier (mapped to crm:E22). lado:ProductionCentre represents the location where Samian were produced (Figure 12), while lado:KilnRegion aggregates multiple centres into a broader area of manufacture. The model explicitly encodes provenance via PROV-O, ensuring transparency in the derivation of data from the Python script. Spatial attributes are recorded using GeoSPARQL WKT literals. This implementation demonstrates that domain ontologies and LOD standards can coexist in an archaeology-specific knowledge ecosystem without sacrificing interpretability or precision.

Figure 11
Semantic overview on the LOSW Ontology. Florian Thiery, Allard W. Mees, Dennis Gottwald, CC BY 4.0.

Figure 12
A: SPARQL query within the NFDI4Objects Knowledge Graph. B: Visualisation of Samian Research production centres and their kiln regions in QGIS. Florian Thiery, CC BY 4.0.
(4.2) Wikidata integration results
Wikidata serves as a community-driven interface to the Samian dataset, translating domain semantics into a globally accessible data structure. The integration comprises three main classes of items (Figure 13): discovery sites (wd:Q102202066), production centres (wd:Q102202026), and kiln regions (wd:Q102201947). These are interlinked to Samian Research (wd:Q90412636) via wd:P361. Core properties used in this mapping include, e.g., wd:P625 coordinate location, wd:P706 located in/on physical feature or wd:3896 geoshape. Each entry has a backlink via wd:P2888 to the primary database. The integration currently covers ~4,000 Wikidata items with geospatial information, of which about 80% include explicit coordinates and almost all reference their LOD counterparts. The use of Wikidata introduces a significant community dimension. Contributors from both academic and citizen-science backgrounds collaborate through the WikiProject Linked Open Samian Ware (LOSW), which documents modelling guidelines and monitors property use. This open governance ensures ongoing quality assurance and fosters transparency in data curation. The multilingual environment of Wikidata also broadens accessibility, allowing item labels and descriptions to be translated across European languages, a key factor in increasing the dataset’s reuse potential.

Figure 13
A: 3876 LOSW Discovery Sites. B: 103 LOSW Production Centres. C: LOSW 11 Kiln regions on Wikidata queried via (A) https://w.wiki/6EBu; (B) https://w.wiki/6EBx; (C) https://w.wiki/FmeX on 2025-10-22; CC0, Wikidata Community.
(4.3) Fuzzy-sl representation and spatial modelling
The fuzzy-sl Wikibase provides a dedicated layer for managing the precision, uncertainty, and provenance of spatial data. While the Samian LOD graph and the Wikidata record discrete points or polygons for findspots and production centres, these representations cannot indicate how precise or derived the coordinates are. The fuzzy-sl model resolves this limitation by introducing attributes that qualify each coordinate with numerical and textual precision indicators. Every fuzzy-sl item represents a site-level entity linked to its LOD and Wikidata counterparts. The model extends CIDOC CRM and PROV-O within the Wikibase ecosystem, through custom properties such as fsl:P4 has coordinate, fsl:P13 certainty description, and fsl:P23 precision. For example, the entries London/Londinium (fsl:Q101), Shadwell/Docks (fsl:Q103), and No. 1 Poultry (fsl:Q95) show different origins and levels of accuracy for the coordinate data. Each item also records the method used (fsl:P7) and the agent responsible (fsl:P14), ensuring that all geospatial assertions have traceable provenance. By linking a fuzzy-sl item to both the Pleiades place (e.g., “London/Londinium”, fsl:Q101) and the Samian findspots (“No. 1 Poultry”, fsl:Q95; “Shadwell Docks”, fsl:Q103), the system separates conceptual from empirical information (Figure 14). The fuzzy-sl implementation thus ensures that spatial information within the Samian corpus remains both semantically rigorous and scientifically transparent.

Figure 14
A: Samian Ware findspots in the fuzzy-sl Wikibase from London and Southwark. B: findspots and potter dies; Licence: map: CC BY 4.0, fuzzy-sl Wikibase; potter dies: LEIZA/Samian Research.
(4.4) Cross-system interoperability and identifier ambiguity
The federated integration of the Samian dataset across RDF, Wikidata and fuzzy-sl demonstrates the benefits and limitations of distributed semantic infrastructures. Alignment between the three layers relies on explicit property mappings (wd:P2888, owl:sameAs, fsl:P10) and on regular reconciliation routines to ensure persistence and data integrity. The result is a knowledge graph architecture in which archaeological, spatial, and conceptual information can be queried together, supporting advanced analyses of production and distribution patterns without duplicating data. A key outcome of this cross-system approach is the exposure of identifier ambiguity, most prominently illustrated by the Corinth case. In the Samian Research LOD, Corinth (samian:loc_ds_1003935) refers to a specific discovery site where Samian pottery has been documented. In Wikidata and related gazetteers, however, several entities coexist: the ancient city (wd:Q1363688), the modern settlement (wd:Q22681231), the archaeological site (wd:Q101834062), and the excavation project (wd:Q5170664). Each represents a distinct conceptual layer, ancient place, modern village, physical excavation site, and institutional research activity, yet they are often conflated during automated reconciliation. Initially, a dedicated item (wd:Q101834062 – Samian Ware Discovery Site Corinth, now merged with the old entry wd:Q103160025) was created to preserve this specificity. Still, subsequent community edits merged it with the general archaeological site entry. This process resulted in a semantic flattening: the original Samian context, its provenance and its explicit LOD reference were absorbed into a broader, less precise category. The event underscores the tension between openness and precision in community-maintained graphs. To mitigate this, the Samian research team proposes maintaining distinct entities for different conceptual levels, linked through explicit relationships rather than merges. Recommended properties include wd:P2888 exact match for data-level correspondences, wd:P625 coordinate location for precise spatial anchoring, and wd:P793 significant event to link excavation activities to their sites. Within the Samian ecosystem, fuzzy-sl items reinforce this differentiation by assigning uncertainty values and provenance to individual coordinates, thereby preserving the granularity of discovery contexts. The Corinth example highlights both the potential and fragility of open knowledge graphs: they enable broad connectivity but depend on disciplined semantic modelling to prevent conceptual drift. As a result, the Samian community advocates the introduction of a dedicated archaeology.link property in Wikidata to permit multiple controlled exact matches to external LOD URIs. Such mechanisms would enhance cross-graph consistency and protect the integrity of domain-specific datasets within global, federated infrastructures.
(5) Implications & Applications
The federated integration of LOSW across the LOD, Wikidata and fuzzy-sl ecosystem demonstrates that FAIR, community-driven infrastructures can sustain complex archaeological data without compromising domain precision. The approach illustrates how specialist datasets can remain authoritative within a shared knowledge graph environment, provided that semantic mappings, provenance and data versioning are explicitly maintained. From an application perspective, this architecture now enables joint geospatial and semantic analyses across multiple corpora. Using the SPARQLing Unicorn Research Toolkit (Thiery, Schenk, et al., 2025, pp. 117–118; Thiery, Fricke, et al., 2025, pp. 6–8; Thiery, Schubert, et al., 2025, pp. 9–13), particularly the SPARQLing Unicorn QGIS Plugin (Thiery & Homburg, 2024; Homburg & Thiery, 2025), Samian data can be queried and visualised directly from LOD resources such as Wikidata (Figure 15) or the NFDI4Objects Knowledge Graph (Voß et al., 2024). An example is the combined analysis of Samian and amphora finds from the CEIPAC research group (Centro para el Estudio de la Interdependencia Provincial en la Antigüedad Clásica) database, where shared potter and workshop identifiers allow comparative mapping of potters and their trade networks via the Roman Open Data SPARQL endpoint (CEIPAC research group & EPNet Project, 2018). Both the Samian Research and CEIPAC databases are covering find distributions on a Europe-wide level, allowing new cross-disciplinary research related to Roman economic history. An example of a search on the distribution of pottery products stamped with the potters’ name Vitalis shows the overlap in sites where Gaulish Samian and amphorae from Baetica haven been found, visualised through federated SPARQL queries integrating Samian Research and CEIPAC datasets (Figure 16).

Figure 15
The workflow from the Samian Research relational database and its analysis with the SPARQLing Unicorn QGIS Plugin. Florian Thiery and Allard W. Mees, CC BY 4.0.

Figure 16
A: Distribution of Samian Ware produced at La Graufesenque by the samian potter Vitalis i (red) and amphoras of type Dressel 20 made by an amphora potter Vitalis (black) produced in the Baetica area. B: Distribution of Samian Ware produced at Les Martres-de-Veyre by the samian potter Vitalis iii (red) and amphoras of type Dressel 20 made by an amphora potter Vitalis (black) produced in the Baetica area. Allard W. Mees and Florian Thiery, CC BY 4.0.
The dating of individual sites where ceramic material finds have been found can be implemented with Allen’s interval algebra. With the help of the RDF-based Alligator method (Figure 17) it is possible to chronologically sort groups of sites, such as Limes sections, when only a few chronological indications are known beforehand and other Limes sections can only be judged by the overlap of material which they have in common and visualised as RDF (Thiery & Mees, 2025).

Figure 17
Aggregated individual sites as Limes sections and their relative Allen time intervals as RDF. Florian Thiery, CC BY 4.0.
In summary, Wikidata functions as a central linking hub for Roman archaeology by providing persistent identifiers, multilingual access, and cross-domain connectivity between otherwise isolated datasets. Its community-governed data model enables the reconciliation of specialist corpora such as Samian Research with complementary resources like CEIPAC without data duplication. Through explicit linking properties and federated SPARQL access, Wikidata enables scalable reuse, comparative analysis, and long-term interoperability across institutional and national boundaries.
Notes
[1] Available at: https://www.rgzm.de/samian (last accessed 13 January 2026).
[2] Available at: https://www.w3.org/TR/prov-o/ (last accessed 13 January 2026). PROV-O is the short form of ‘The PROV Ontology’; PROV is shorthand for provenance.
[3] Available at: https://github.com/BFO-ontology/BFO (last accessed 13 January 2026). BFO is the acronym for ‘Basic Formal Ontology’.
[4] Available at: https://ise-fizkarlsruhe.github.io/nfdicore/ (last accessed 13 January 2026).
[5] Available at: https://archaeonatural-cloud.github.io/archaeonatural-ontology/ (last accessed 13 January 2026).
[6] Hans Dragendorff (1870–1941) was the founder of provincial Roman archaeology and the creator of the first form type catalogue back in 1895 (Dragendorff, 1895).
[7] Available at: https://commons.wikimedia.org/wiki/File:Datamodel_in_Wikidata_for_Decibalus_Samian_Vessel.jpg (last accessed 13 January 2026).
[8] Available at: https://commons.wikimedia.org/wiki/File:Decibalus_Rubbing.tif (last accessed 13 January 2026).
[9] Available at: https://pleiades.stoa.org/ (last accessed 13 January 2026).
[10] Available at: https://www.geonames.org/ (last accessed 13 January 2026).
Acknowledgements
The authors would like to thank Dennis Gottwald for participating in the Samian transformation journey, as well as the Samian Research and Wikidata communities for curating the data.
Competing Interests
One of the authors is involved in the curation and maintenance of the Samian Research database, which serves as a primary data source for the Linked Open Samian Ware project discussed in this article. Samian Research is a publicly supported, community-curated resource with substantial contributions from an international community of volunteers.
The authors are directly responsible for the modelling, transformation, and publication of the Linked Open Samian Ware dataset and for contributions to Wikidata and the fuzzy-sl Wikibase presented in this study. These infrastructures are developed within open, non-commercial, community-based research environments and are published under established open licences.
The authors do not derive any financial, commercial, proprietary, or exclusive access benefits from these activities. This involvement is disclosed in the interest of transparency and does not affect the objectivity, interpretation, or validity of the research presented.
Author Contributions
Allard W. Mees – Conceptualisation, Data curation, Formal analysis, Funding acquisition, Methodology, Project administration, Software, Supervision, Visualisation, Writing – original draft, Writing – review & editing.
Florian Thiery – Conceptualisation, Data curation, Formal analysis, Methodology, Software, Visualisation, Writing – original draft, Writing – review & editing.
