Have a personal or library account? Click to login
Mapping Our Heritage: Towards a Sustainable Future for Digital Spatial Information and Technologies in European Archaeological Heritage Management Cover

Mapping Our Heritage: Towards a Sustainable Future for Digital Spatial Information and Technologies in European Archaeological Heritage Management

Open Access
|Jun 2019

Full Article

Introduction

The escalation of digital spatial information has led archaeologists all over Europe to increasingly rely on digital data to prepare and carry out archaeological research, both in academia and heritage management. Spatial information, collected in large quantities by archaeologists since the implementation of the Valletta Convention (Council of Europe 1992), is also progressively used to guide heritage management policies, from urban design to rural planning and tourism (e.g. Cuca et al. 2012; McKeague et al. 2012; Huvila 2017). Furthermore, spatial information is more and more used to involve the general public, using digital technologies in museums and other sites of archaeological interest (e.g. González-Tennant 2016; Seitsonen 2017), but also to involve amateur archaeologists in data collection programs, for example using crowdsourcing (e.g. Dhonju et al. 2017; Seitsonen 2017; CReAAH 2019; MOLA 2019; SCAPE 2019; Schweizerische Eidgenossenschaft 2019).

Since the quality of research results and heritage management decisions is highly dependent on the nature of the available data, issues of sustainability of digital data repositories, accessibility and reliability of data, standardization of data formats and management of property rights are currently widely debated. The lack of consistency in (spatial) data standards greatly inhibits the ability to develop sustainable solutions for managing, sharing and analysis of that data. Cross-regional or even supra-national analysis of data sets is therefore in most cases a highly time-consuming or even impossible task.

Even when datasets are managed according to well-documented standards, the development of appropriate tools to access and present relevant (spatial) information to researchers, heritage managers and general public is still very much in the stage of exploration and focuses mainly on project-specific contexts that often have a short lifespan. Software solutions typically have a short lifecycle as well and will cater for different user demands. The flexibility of interfaces is usually limited and does not support interoperability with other datasets. It can therefore be questioned whether these systems really provide the end users with the data and information that they would like to have.

In this position paper, we discuss the existing solutions and state-of-the-art approaches, and their effectiveness in providing the end users with relevant and up-to-date information, based on our experiences in developing data retrieval systems and spatial data infrastructures for various purposes. From there, we will reflect on how to develop more sustainable approaches and technologies for management and use of spatial data, particularly in research and archaeological heritage management, but also within the broader contexts of planning, design and public involvement. It is envisaged that data quality issues will be at the heart of future development work of spatial data infrastructures. However, the success of these infrastructures will also be guided by issues of Open Access and interoperability.

Historical overview

Development of national archaeological databases and their use

The origins of large (national) archaeological databases stem from the post-Second-World War shift in cultural heritage legislation and expansion of the archaeological enterprise in a number of developed countries (Willems 2000; Demoule 2012), leading to systematisation and standardization of archaeological surveys and the compilation of, at first, paper-based registries of archaeological sites (Norman & Sohlenius 2009; Niedziółka 2016). Although early computerised databases date back to the 1960s (Lock 2003), larger-scale deployment of digital information systems began to take place in the late 1980s and early 1990s (e.g. Willems 1997; Kuna 2002; Niedziółka 2016). The need to coordinate and manage archaeological fieldwork and effective decision-making required effective management of information. Maps and spatial data were especially important in this respect, as a core problem of archaeological heritage management and preservation is to collect and maintain accurate information on the location and extents of archaeological sites and monuments and investigated areas.

Even if the national sites and monuments registries were developed primarily as inventories (e.g. Sohlenius 2014) or administrative tools for heritage management (Niedziółka 2016), they also have proved to be useful catalogues of archaeological sites for researchers (Cooper & Green 2016; Niedziółka 2016). However, because of their aggregate nature, they tended to contain only a part of all documented information and their contents were seldom organised to be helpful for answering specific research questions pursued by individual researchers and projects (Meyer et al. 2007; Buckland & Eriksson 2014). In practice this has led to a proliferation of dedicated research project specific databases. Their heterogeneity and the lack of centralised repositories or portals have made them difficult to access and use (Kintigh 2006; Löwenborg 2014).

Role of GIS and spatial database infrastructures

The large-scale development and use of spatial databases began only after the introduction of GIS software that was reasonably easy to use. Together with affordable personal computers this opened up the possibility to access and build databases and conduct spatial analysis for individual archaeologists and heritage administrators. The emergence of digital spatial data analysis has been one of the most influential factors affecting archaeological work during the past few decades, even when it has been subject to criticism (cf. Rajala 2004; Valenti & Nardini 2004; Conolly & Lake 2006). Apart from the evident usefulness of GIS in archaeological research (see e.g. Lock & Stančič 1995), its potential was also soon noted in archaeological heritage management (Limp 2000). Therefore, the level of standardisation of some spatial datasets is high, particularly in ones developed for regulatory purposes, whereas comparatively little effort has been made to standardize primary evidence from fieldwork and research. This has undoubtedly contributed to the fact that, currently, of all archaeological information, spatial data is probably best represented in digital repositories even if there are many obstacles to managing, preserving and making spatial information accessible.

A central ambition of national archaeological databases was also to make archaeological information more accessible. Many of the archaeological databases and repositories compiled during the 1990s were entirely, or partly, searchable on the Internet or had an ambition to open up for online access at a later date (e.g. Wise & Richards 1999; Kuna 2002). Since the first web-based interfaces, the comprehensiveness of accessible data has improved as well as the general understanding of the uses of, and users of, archaeological spatial data (e.g. McKeague et al. 2012). But the question of the best way to treat legacy data and old online sources remains open.

Creation of digital archives

The digitalisation of the production of archaeological information in general and spatial data in particular from the 1990s onwards meant that the old paper-based practices of archiving archaeological data were rapidly becoming obsolete (Huvila 2016). Even if a large number of digital archaeological databases were developed already during the 1990s with an aim of recording and making available information on archaeological sites to support both research and heritage management (e.g. Roorda & Wiemer 1992), they were generally designed for hosting only a subset of all documentation. As Wise and Richards (1999) underline, with few exceptions (e.g. Eiteljorg II 1995), there was a chronic lack of consideration of long-term preservation of information.

Obsolescence in both software and storage formats are major risks to the sustainability of digital data. In spite of the large-scale acquisition of spatial data, the paper report remained (and in many cases, still remains) the standard deliverable. The digital data either remains undiscoverable with the data creator (Shaw et al. 2009) or is deleted, hence of no subsequent value beyond the project lifespan. Another problem was, and still is, the proliferation of project and site-specific databases with their own peculiar data structures, concepts and vocabularies (Oikarinen & Kortelainen 2013) that have proved to be difficult to harmonise on a meaningful level. This is a problem common to many disciplines (Bowker & Star 1999). In the UK, the need to interrogate data from multiple projects has been acknowledged and excavation and other fieldwork datasets are produced by commercial archaeology often creating data ‘on a per-site basis structured according to differing schema and employing different vocabularies. Consequently, cross search, comparison or other reuse of the data in any meaningful way remains difficult. This hinders the reassessment of the original archaeological findings and reinterpretation in the light of evolving research questions’ (Binding et al. 2015). The problem led to the development of the CIDOC-CRM EH extension and a range of semantic tools through the STAR and STELLAR projects (Binding et al. 2015). However, this semantic approach addressed data content rather than the spatial components (geometry) of project archives. Compared to other types of information, spatial data is relatively easy to work with but as a plethora of projects have shown (e.g. Green 2012; Löwenborg 2014), integrating spatial data from different sources and coordinate systems is an arduous undertaking. The increasing availability and use of digital spatial data have been, however, some of the cornerstones and central enablers of the integration of archaeological datasets as a whole (Huggett & Ross 2004).

Even when ensuring that data are accessible in the long term, they are mostly treated on a project by project basis with little or no standardisation across datasets: that is, the mechanisms to ensure the creation, exchange and use of spatial data do not exist for archaeological data within a Spatial Data Infrastructure (SDI), defined as a framework of technologies, policies, and institutional arrangements that together facilitate the creation, exchange, and use of geospatial data and related information resources across an information-sharing community (ESRI 2016).

A lot of significant work for the development of (non-spatial) data infrastructures was initiated in the context of research and development projects around the turn of the century. In terms of developing and maintaining national digital repositories for preserving and providing access to archaeological information, the Archaeology Data Service (ADS) in the UK, established in 1996 (Richards 2002), and the Dutch Data Archiving and Networked Services (DANS), established in 2005 (Gilissen & Hollander 2017), have been pioneers in the field. Even though both services provide excellent models for the preservation of digital data, there is no coordination of the spatial value of that data. Furthermore, the work towards establishing proper digital archives for keeping archaeological data has progressed slowly and many countries are still lacking comprehensive infrastructures for archiving archaeological information, including spatial data. Despite the symbiotic relationship between inventories and the archive, all too often the two functions are performed by separate institutions. This artificial separation limits the spatial potential of (project) data to inform and enhance the inventory and for the inventory to signpost the archive through connected map layers.

State of the art

International guidelines and standards

The value of geospatial data is recognized internationally through the coordinating efforts of the Open Geospatial Consortium (OGC 2019a) in defining Open Data standards for the global geospatial community. Globally, the value of geospatial data is recognised through the United Nations Global geospatial Information Management (UN-GGIM) strategic framework (UN-GGIM 2018). Many international and national initiatives, including the INSPIRE Directive (European Commission 2019) that aim at realizing the potential of environmental data in Europe acknowledge the value of spatial data for a range of activities including decision making processes. Governments including The Netherlands, through Geonovum (Geonovum 2019) and the United Kingdom, through the Geospatial Commission (Gov.UK 2019) recognise the value of geospatial data to society and economy.

Approaches to the creation, collation and distribution of spatial data in archaeology across Europe are extremely fragmented. Although the Valletta Convention recognises the need for national inventories, to keep them up to date and to facilitate the national and international exchange of scientific information, implementation is left up to the national laws, institutions and approaches of individual signatory states.

Broadly the functions within each state may be defined as

  • regulatory – the designation and management of the archaeological resource through formal (legislative) and informal (planning control and management agreements) processes;

  • archives and collections – the long-term deposition and preservation of paper, physical and increasingly digital material; and

  • investigation and research – undertaken by a range of organisations in the public, private and third sector (community, volunteer and crowdsourcing), each with their own priorities.

Public institutions are focused on delivering their key corporate objectives often with little opportunity for innovation whilst the growing commercial sector is predominantly driven by competitive processes focusing on project delivery rather than contributing to the bigger picture. Financial constraints, the need to demonstrate impact of your organisation coupled with institutional inertia in the face of the digital revolution reinforce data silo mentalities leaving the potential offered by digital datasets far from being realised. Moreover, Valletta could not foresee the revolution and opportunities offered by digital data so there are no incentives or drivers to work collaboratively towards developing an SDI for cultural heritage data.

Despite the obvious benefits of frameworks developed to contribute to interoperability such as the CIDOC CRM (Doerr 2003; ICOM CIDOC 2019a) or more pragmatic solutions required to collate and share data locally, there is no true coordination, leadership or mandate either at European or national level from within the profession to coordinate harmonising spatial data across organisations and jurisdictions.

Instead, the key drivers mandating the harmonisation and publication of datasets are external. In particular, the European Union INSPIRE Directive, transformed into the national legislation of member countries, requires that Protected Sites are published as WMS and WFS to agreed technical specifications, to inform Community environmental policies and policies or activities which may have an impact on the environment. As legislative frameworks and working practices differ in detail across Europe, approaches to the publication of ‘protected sites’ are uneven with some authorities only releasing formally designated data rather than full inventories.

INSPIRE only addresses authoritative datasets but archaeologists routinely create spatial data through a range of increasingly sophisticated digital fieldwork techniques. There is no requirement to collate spatial data from different archaeological projects into a single resource. For example, a map of the archaeological landscape may be created and published by applying consistent data standards to transcriptions of individual archaeological sites (Figure 1). This map may be used for internal management purposes, published online as part of a heritage portal (Canmore), and exposed through a metadata catalogue as WMS or WFS for others to access on their own GIS. Yet all too often data is created on a project by project basis following different conventions and formats, greatly restricting the ability to develop maps of the archaeological landscape.

jcaa-2-1-23-g1.jpg
Figure 1

The archaeological landscape from Neolithic, through Roman to industrial at Inveresk, Scotland as revealed by cropmarks and published on Canmore: the online portal of the National Record of the Historic Environment for Scotland.

Within Europeana (Europeana Foundation 2019), CARARE (Carare 2019) acts as an aggregator for archaeological and architectural heritage but the emphasis is firmly placed on the cultural value of objects within virtual collections rather than the spatial content. More recently the ARIADNE Infrastructure project (ARIADNE 2019a) pooled existing archaeological research data infrastructures through new and powerful technologies to provide a European-wide interoperable dataset. Spatial data is limited to location based searching rather than rendering the spatial footprint of the asset. The portal displays the location of records in the system. Figure 2 shows the location of an excavation undertaken by INRAP in Blois, France. Users can read a summary of the project on the portal (ARIADNE 2019b) or follow a hyperlink through to the contributing organisation’s own resource (INRAP 2019). However, INRAP also maps the spatial footprint of excavations and the features revealed (Figure 3). Rich attribution allows the user to explore the data. This data could be added to the portal as either a WMS or WFS. The limit of that kind of project is that only the INRAP data is available online. This excludes all data from the 20th century, which was available in the former PATRIARCHE system and will soon be accessible through the Pleade portal (PLEADE 2019) from the French Ministry of Culture. Even this platform does not share research and university data, despite the existence of European top-down tools like ARIADNE or Europeana and several bottom-up solutions, like ArkeoGIS (2019) or very local tools like those of the ChasseoLab (2019).

jcaa-2-1-23-g2.jpg
Figure 2

The ARIADNE portal displays the locational details of an archaeological intervention.

jcaa-2-1-23-g3.jpg
Figure 3

Spatial data is more than a place marker: much more spatial data was recorded by INRAP during the fieldwork, including both the project and trench extents as well as the locations of individual features.

The two examples presented here (Figures 13) require the consistent application of data standards to combine data from multiple projects into organised datasets managed as part of core business applications and published online through web-GIS portals. For INSPIRE mandated datasets data is also available as WMS and WFS enabling other organisations, such as environmental consultancies, and researchers to work more efficiently by accessing information directly into their own systems. Although the ARIADNE Portal does not use dynamic WMS and WFS mandated by INSPIRE (in part the attribution is not rich enough), the portal could both signpost and consume those services to deliver truly spatial datasets within a dedicated heritage portal.

Archaeological spatial data tend not to conform to recognised ISO standards. The ISO Standard ISO 19115-2014 (ISO 2014) defines the schema for describing geographical information and associated services, including contents, spatial-temporal extents, data quality, access and rights to use (Shaw et al. 2009: 8). Metadata should be discoverable online so that a user can understand the nature, content and extent of the data. Additionally, there should be enough information to allow the user to explore and assess if the data is fit for purpose. For complex datasets such as remote sensing data, this should include technical metadata documenting the instrumentation and resolution of the capture scale. Finally, exploitation metadata is essential to allow the user to access transfer and apply the data in their own systems both within the heritage sector and for wider environmental benefits. These technical specifications need to be defined and documented, including mapping to the CIDOC-CRM and CRMgeo (Hiebel, Doerr & Eide 2017; ICOM CIDOC 2019b) standards.

Then there is the need to demonstrate the benefits of developing a standardised approach for spatial information in heritage to both archaeologists and those outside the profession. Project-led recording pays little attention to consistent, interoperable data so that it is time consuming to collate data from different projects – even those undertaken by the same organisation. Project driven, fragmented spatial data presents archaeology as amateurish, although this is not a problem unique to archaeology. It is time consuming to find and combine data and to the end user the lack of consistency in how information is displayed can appear unprofessional. Benefits to a consistent approach are both practical and creative, so long as space remains for local specificity in data. Users are able to work more efficiently through information access and consistent data has the potential to offer new insights.

Technologies used

Problem statement

The introduction of spatial technologies has created unprecedented opportunities for data collection, analysis and archiving. However, it could also be argued this has in some ways worsened rather than improved the interoperability of archaeological information. Hitherto, any person knowledgeable in archaeological terminology could access either handwritten or printed materials to inspect the results of archaeological research. The digitization process created new opportunities, but also imposed new hurdles on research data accessibility:

  • The new technologies require a significant level of geospatial information (GI) literacy (see Nazari 2011; De Kleijn et al. 2014) that is at odds with the text-oriented skills of humanities scholarship.

  • Even if properly curated, the differences in digital strategies used to describe the archaeological interpretations and methodology makes it very hard to combine information from different archaeological projects. This is especially the case for spatial archaeological information. Previously an archaeologist required no more than paper maps to (re-)interpret, now one has to deal with spatial data files in possibly differing formats, coordinate systems and information schemas. The visually interpretable world of archaeology has become much less intuitive. When a standardized format or system is used, differences in format and coordinate system can be easily overcome in a technical sense. Yet, differences in information schemas are not so easily alleviated if one does not have access to the exact semantics of the differences.

  • Authorities have been lagging behind in adopting standards that alleviate these tensions in the (re-)use of digital archaeological data. For instance: Dutch archaeology created the SIKB0102 information exchange format (Van ‘t Veer 2012; Boasson & Visser 2017), but this standard has to date not been enforced by any organisation. So, currently most information models for archaeological data collections are hidden in the database applications where these data collections reside.

  • Although GIS systems have contributed a lot to improve working with large spatial databases on a scale where paper maps become useless, the curation of archaeological field data still requires a lot of time and effort.

Seen from the perspective of knowledge and information systems, there have been important developments in the subdomains of knowledge systems known as knowledge representation and reasoning (or KRR) and artificial intelligence. These fields represent the state of the art when it comes to advanced knowledge representation and data analysis strategies and technology. The question therefore is: how can these fields contribute to the improved (re-)use of archaeological data, digital spatial data in particular?

Knowledge representation in archaeology

One of the largest successes in the field of knowledge representation has been in the development of the Semantic Web movement (Berners-Lee et al. 2001). The implementation of semantic web strategies is called Linked Data and its data framework RDF – the Resource Description Framework. Linked data is information broken down in so-called triples, each of which represents an entity-attribute-value statement (Antoniou & van Harmelen 2004: 24–25). Each triple is thus a combination of a subject (entity), a predicate (attribute) and an object (value). While this idea may seem abstract, its example application for archaeology is very straightforward: the information elements making up a Harris matrix are exactly that: a subject (an archaeological feature) being related through a predicate (e.g. cuts, cut by, etc.) to another object (another archaeological feature).

In fact, archaeological field data, in the relationships between archaeological contexts and other archaeological contexts, finds and methodological constructs, function very much like a complicated graph structure that is at odds with the relational database structures that we tend to use to describe these entities. Instead, the RDF structure of describing these entities in terms of their relationships to other entities allows for easier deep graph traversal and network analysis. The use of Linked Data technologies, however, is by no means simple. The problem of GI literacy may be reduced in the future by the rise of generations of archaeologists that know no other world than the digital, but the technical expertise required to express archaeological data as Linked Data is considerable.

The second area where semantic web technologies can contribute is in describing the semantics, using web addresses. The prevalent way to embed the semantics of data in the data itself is through Linked Data. RDF is often used as a standard to describe the properties and semantics of data in a way that is much more descriptive than just an ambiguous label that is often used in databases, such as “date”. It is easy to imagine confusion over the property “date”: is this the dating of some find, the time it was collected in the field, when it was recorded in the field recording system, when it was entered in the database, altered in the database or when it was deposited in an archive? Best practices in RDF require you to model your data in more specific terms than is customary in most current (spatial) database systems, allowing users to look up the exact intended use of an instance just by following the links that describe the types and predicates of the data. Even if two data sets use a different vocabulary to describe the same types of entity, the classes used in these vocabularies can be (locally) equated to or subsumed, allowing the user to query across these data sets using one vocabulary.

The development of CIDOC-CRM extensions CRMgeo and CRMarchaeo (ICOM CIDOC 2019c; Nicolucci 2017) is highly relevant here. After its CRM-EH predecessor (CRM-EH 2019), it may be the first internationally drafted meta-standard for describing archaeological data using RDF. In the long term, it could go a long way towards solving problem statement no. 2 above: the lack of interpretable data. Also, if developed further, it could serve to help with problem statement 3: the lack in adoption of shared archaeological data standards. However, it is still in a very early stage, and, as part of the CIDOC-CRM initiative, is not a domain ontology, but a top ontology, intended to be implemented in domain ontologies. As such, it covers archaeological concepts only on a highly abstract level, such as generic archaeological process units. CRMarchaeo differentiates between different archaeological process unit subtypes such as trenches and sections, which is a level of description required to express the meaning of actual data records. As a consequence, at present there is no overarching domain ontology derived from CRMarchaeo that allows easy use and implementation. Furthermore, it suffers from the same issue in exacerbating problem statement no. 1: semantics are complex to express in Linked Data: one not only needs a very high degree of understanding of how archaeological concepts are related, but also considerable Linked Data expertise to create a good semantic model.

Coordinate systems, place name gazetteers and ontologies provide tools that enable the cross-searching and analysis of spatial data but spatio-temporal data, often modelled locally or regionally, presents a particular challenge. The problem has been addressed through PeriodO (2019), cataloguing ‘not global period concepts, but specific period definitions: authoritative assertions about the chronological and geographical coverage of period concepts, expressed using machine-readable coordinates (including start and end dates as well as geographic boundaries)’ to ‘facilitate the discovery of chronologically related data across heterogeneous digital resources’ (Rabinowitz et al. 2016: 44–45).

The CIDOC-CRMgeo extension is an ontology extension modeling spatio-temporal aspects of heritage in general. It bridges the gap of cultural heritage modeling with geospatial and temporal properties, opening the possibility for integrating knowledge bases between heritage subdomains. CRMgeo interfaces with a geospatial extension called GeoSPARQL (OGC 2019b), that, in theory, facilitates spatial interoperability.

The GeoSPARQL standard is composed of a vocabulary part and a functional SPARQL extension. The GeoSPARQL vocabulary is highly useful in separating and describing spatial properties (geometries) from their objects (features), where in non-semantic data formats, these separate entities are often confused and concatenated. In GeoSPARQL, multiple geometries can describe the same object, at different functional levels: one can supply a centroid point coordinate for the use of labeling, and a complex multipolygon to describe its spatial extents. The extension links the GeoSPARQL vocabulary to CIDOC-CRM spatial and spatiotemporal concepts. At first sight this is of great value to incorporate cross-domain geospatial standards rather than to re-invent the wheel, but the practical value of both the CIDOC-CRM geospatial and archaeology extensions needs careful evaluation in future work.

The GeoSPARQL functions on the other hand offer little advantage over the range of functionality found in common spatial data infrastructures and GIS, except that it can be used in conjunction with Linked Data. For instance, common functions to calculate polygon areas, reproject to different coordinate systems or return the number of vertices in a geometry are lacking in the current version 1.0 (Perry & Herring 2012).

In considering the CRMarchao and CRMgeo extensions, there is still a large void between an abstract set of top concepts in CIDOC-CRM and their practical implementation as implementation documentation and manuals for the extensions appear to be unavailable.

Artificial Intelligence for archaeology

The field of artificial intelligence and, in particular, of machine learning has been diversifying and improving fast. Developed by large technology companies, the available technologies tend to be highly abstract and hard to use but are rapidly gaining in usability and performance. Machine learning comprises a set of data analysis strategies that rely mostly on principles derived from statistics and linear algebra (Goodfellow et al. 2016).

Archaeological applications of machine learning are still rare and mainly confined to issues of automated classification (Van der Maaten et al. 2007; Hörr, Lindinger & Brunett 2014) and prediction (Oonk & Spijker 2015). In the ARIADNE project, an experimental automatic hypothesis generating setup was tried to see if novel hypothetical viewpoints could be formulated from patterns in the data itself (Wilcke et al. 2017). Other studies are underway, such as in the ArchAIDE project (ArchAIDE 2019), but publications in general on AI and archaeology are still few and far between.

However, it is likely that machine learning may have a tremendous impact on almost every data heavy analysis in archaeology, in support systems for

  • classifying, tagging or captioning of digital field or artefact images

  • automated extraction of feature geometries from (unmanned) aerial photography and LiDAR (Verschoof-van der Vaart & Lambers 2019)

  • classification and description of archaeological features (Van’t Veer et al. 2018)

  • classifying, dating and matching of artefacts

  • error detection in archaeological records

  • predictive modelling of site locations and routes.

These goals are getting within reach using the current state of technology in machine learning. With regards to problem statement 4 – the time-consuming curation of archaeological information, and issues relating to the fragmentary nature of data (e.g. Löwenborg 2018) – the machine learning technologies are likely to be able to play a significant role but with so few practical results it is too early to tell.

Challenges

Sustainability

As highlighted earlier, efforts to guarantee the sustainable archiving of digital archaeological data have been partially successful at the local and national level, even when the spatial dimension of archaeological data has not been the primary focus. However, even well-established initiatives are still very vulnerable to changes in the political climate. Almost all organisations curating archaeological data do so on the basis of governmental funding, and changes in political priorities can easily lead to loss of datasets if curating these is not part of the legal obligations of administrative authorities at local, regional, national or transnational levels. The Dutch national digital archive DANS, for example, is currently financed by the Dutch government as a partnership between the Royal Dutch Academy of Sciences (KNAW) and the Netherlands Organisation for Scientific Research (NWO), as part of a larger effort to curate scientific research data. However, there is no guarantee that this agreement will continue in the future, and if so, whether accessing data will remain free of charge, and if DANS will continue to receive enough funding to adapt its services to changing technological and research environments. And the Netherlands are ahead of many other countries in acknowledging that there is a problem that needs to be addressed at the national level. The funding for the German IANUS initiative (IANUS 2019), for example, was ended in 2017 without a follow-up strategy for implementing the project’s results. Clearly, sustainability is not just a matter of developing exchange standards and creating repositories, but also of developing infrastructures that have a sound financial and legal basis.

Open Data and the FAIR Principles

External factors can act as a catalyst for change in archaeological data management. In publication the move to Open Access is gradually changing working practices. The growth of data papers heralds new working practices enabling reuse and critical reassessment of primary data. The vision for developing a sustainable future for spatial data is aligned on the Open Data Charter (2019) and the FAIR principles (Data FAIRport 2019). Releasing data under an Open Data licence ensures that the terms for reusing data are clearly defined. Open Data, therefore, represents an important development in developing collaborative mapping solutions.

Open Data are typically thought of as textual data, but the principles and definitions are equally applicable to spatial data. An additional challenge for some spatial data is third party intellectual property rights and copyright for data sourced against national mapping agency or other vendor products. Going forward, openness needs to be built into data creation to ensure ease of re-use.

Legacy data presents another challenge. Archive datasets, from paper records to most born digitally, were created without thought of transformation or re-use in a spatial data infrastructure. Traditional approaches persist with most spatial data consigned to an illustration in a project report, fossilising in print the spatial knowledge often acquired digitally in the field. Resolving copyright issues aside, much of this information cannot be easily or accurately geo-referenced and the sheer time and effort required to capture legacy data appears daunting. However, a pragmatic approach might be to capture the data ‘as required’ to support other project work or through training programmes.

Counterarguments to opening data are familiar. Publishing site locations raises the risk of looting through metal detecting and other activities. There will always be a hard core of determined looters, but risks can be mitigated through education and cooperation. At the same time promoting heritage amongst local communities often raises awareness in their surroundings empowering a sense of stewardship. A middle ground between those interests is often found by permitting access to data at various levels of detail for various user groups. The Portable Antiquities of the Netherlands project (PAN 2019), for example, provides searching and mapping of (metal) items at the municipality level, but will not provide exact coordinates unless the user has a login to the system. The Portable Antiquities Scheme in England (British Museum & Amgueddfa Cymru 2019) will provide the spatial information to the nearest 1 km square.

Open Data elsewhere also challenges perceptions of data ownership. Archaeologists are often reluctant to share data, arguing that in so doing they are giving away their research. Yet their research is built upon data often gathered at public expense. Under Open Data elsewhere, early release of the data can help inform research and challenge conclusions. The rise of data papers, providing citations, helps demonstrate the changing thinking towards data in general but perhaps more is required to acknowledge the work of the original researcher.

Some agencies have developed business models towards accessing data where a limited amount of data is freely available with the rest behind a paywall. The impact of Open Data elsewhere within government agencies on freemium data models is yet to be fully explored. Approaches to opening data have to be tailored to the specific conditions of each country, but archaeologists also need to recognise the relationship their data has with the rest of the geospatial world. Through INSPIRE some data has been released to enable and inform others to make decisions that affect the historic environment. INSPIRE raises the expectation that public data is findable and reusable. Development of web portals accessing geospatial web services, and complex modelling tools like ecosystem services expect to find and use data. If archaeologists do not participate in these systems, there is a very real danger that their data will simply be ignored.

Data quality management

Despite the fact that standards for data description and exchange are now well developed and increasingly being accepted within the archaeological community, there is still a lack of common concepts to be used for the evaluation of data quality. The most common quality aspects of archaeological data can be related to four main types of uncertainty: spatial precision, chronological certainty, interpretational accuracy and completeness of documentation (Verhagen et al. 2016). However, methods to define levels of certainty and to compare datasets with different accuracies have hardly been a subject of debate within the archaeological community. For example, the practice of ‘fuzzy’ dating of archaeological objects and assemblages (see e.g. Green 2011; Crema 2012) does not fully solve the problem of understanding chronological uncertainties, since it is inevitably based on a subjective assessment of the accuracy of dating, unless absolute dating methods are available. To rely on absolute chronology, however, seems essential, even when lots of dates are fuzzy. Different online gazetteers (e.g. PeriodO, Pleiades) and ontologies (e.g. OWL-Time) appear to offer good intermediate solutions.

From a practical point of view, data quality assessment should start from the definition of an ideal dataset and listing its characteristics. For example, an ‘ideal’ field survey dataset should be based on dGPS-based coordinates, should follow a field walking strategy that is according to current scientific standards, and employ a standardized and accepted system for recording and describing finds. Already, such a seemingly simple data quality description will be quite complex to define, let alone ‘mapping’ existing field survey datasets onto it and evaluating data quality on this basis.

Towards realising the value of spatial data in archaeology

Multiple agencies/actors create and use spatial data in archaeology, but the potential of that data remains unrealized despite initiatives in the wider geospatial community. This paper has outlined existing approaches/initiatives to delivering value from spatial data. The strengths and weaknesses of the different approaches are summarized in Table 1.

Table 1

Suggested strengths and weaknesses of different approaches to realising the value of spatial data from archaeological datasets.

InfrastructureConceptual Reference ModelData reusersData creators
Initiatives INSPIRECIDOC-CRM and extensionsResearch projectsArchaeological practitioners
Key driver Legal requirementStandards ResearchResearchBusiness requirements
Engagement Compulsory for mandated public sector datasetsResearch ledResearch ledCore part of business delivery
Strengths Part of a robust infrastructure for sharing environmental data across EuropeAccepted international standardWell-funded innovative development phaseBusiness efficiency
Implementing organisations capable of sustaining data delivery through core fundingProvides an integrating framework for diverse data structuresProject partners represent a coalition of the willingCan define own specifications
Data discovery and delivery provides (relatively) easy access to, and reuse of, dataAggregates data from multiple sources
Weaknesses Restricted to ‘Protected Sites’ dataHigh technological threshold to implementSpatial component generally restricted to locational data onlyFragmented approach
– project focused
– multiplicity of recording standards
Does not address data created through archaeological fieldwork and researchSkills deficiency in wider archaeological communityPartners predominantly research focused and coalitions of the willingLong term value of data beyond project lifespan not recognised
Only applies to public sector dataLow engagement from public sectorDependent on research funding for sustainability
Tailored Conceptual Reference Models
Value rests with the data aggregator
Incomplete datasets are not suitable for decision making
No engagement with data creators
Sustainability Legislative requirement for mandated datasetsCore standard maintained and developed by specialist communityProject life cycleReliant on an appropriate archive to host the data

The two key approaches outlined in this paper address different aspects of realizing the spatial value from archaeological data. Development of consistent spatial data standards, following the INSPIRE model, requires effective data management through the definition and implementation of appropriate data standards and technical specifications across a range of archaeological spatial datasets. This model works well where there is a requirement to publish defined datasets, including spatial extents but does not address the richness and variety of the archaeological data behind the mapped content.

Semantic approaches, based on conceptual reference models such as the CIDOC-CRM and extensions, address interoperability by defining an integrating framework for manipulating data from multiple projects. These approaches underpin research projects, including Pelagios (Pelagios Commons 2019) and ARIADNE, that aggregate data from multiple sources. Although very powerful, the voluntary nature of participation and incompleteness of the data means that data cannot be relied on for decision making purposes. Current initiatives downplay the value of the mapping element to location – the simple X and Y coordinates rather than the full geometries of the archaeology not required for their projects.

Much more effort is required to realise the potential of (spatial) data created through fieldwork and research. There are no agreed standards or specifications for documenting fieldwork with organisations and even separate projects within organisations each defining their own technical specifications. Too often data from fieldwork is seen as a means to an end – the production of a report for a client – without consideration of the bigger picture. Addressing this issue requires a dialogue between those who create and those who curate data to define and share a common approach to maximizing the value from spatial data from fieldwork.

Future agenda

The themes approached in this paper show how much research time, expertise and imagination has already been invested at local, regional and international scales. It is now time to move towards larger scales through sharing knowledge and expertise within a structured framework, if possible by keeping the habits of the end users and the positive aspects of as many projects as possible.

Quantitative indicators of data quality

Large datasets inherently have their imperfections, due to the effects of combining various sources with different standards of data entry and data management practices (see e.g. Cooper & Green 2016). However, the fear of criticism of less than perfect data hinders some individuals and institutions from sharing their data. It is therefore important that tools for assessing data quality will not be used to shame and blame the original contributors but will act as an incentive to work towards improving data quality. After all, it is up to the future users of legacy datasets to decide whether the data is good enough for purpose, and if it is not, to develop strategies to enhance data quality. The good news is that good curation tools for digital data, such as graphs, are increasingly available, but we also need to find to ways to pay and credit the curators.

Open Data

Ideally, using the FAIR principles and sharing both raw data and expertise can only be positive for research. The challenge will be to manage for digital data curators adopting Open Data and the FAIR Data principles as part of their business model, without compromising the ability to attract funding in a competitive market. Acknowledgement of the value of data through FAIR Data principles is an important step in changing attitudes of funders. Moreover, Open and FAIR Data aligns with the Valletta Convention’s requirements to exchange scientific data.

A complex issue is the difference on policies and strategies towards Open Data in various parts of the world. Even inside the EU, and despite transnational protocols, the approaches vary enormously, from moving towards completely Open Access for public data to setting barriers for use by asking handling fees or even restricting access because of real or perceived privacy, copyright, ownership and security issues. The difference between federal and central states is another issue here. This is not something that the archaeological community can solve on its own, but it will be necessary to signal through international archaeological organisations, such as EAA, WAC or EAC, that archaeologists support and are committed to data sharing. This is also necessary to increase support for our work among non-archaeologists, and to maintain transparency about what we are doing as practitioners with data that in the end are paid for with public money.

Sustainability

Many old databases are stored in obsolete (or soon-to-be outdated) software formats, and only archived on media like CDs, zip-disks or even floppy disks. It is time to take action now to make sure that old data sets will not be lost forever within a few years’ time. In order to guarantee sustainability of legacy data it is first of all important to have a good overview of what is there, but what isn’t shared yet. Data owners need to join forces to investigate the size of the problem, and to develop strategies for dealing with it.

The current shared data initiatives are the frontrunners, and we need to make sure that the practices developed there will be followed by data owners around the world in academia, government and private institutions. Publishing data papers and documenting good practices seem to the most effective way to achieve this, since top-down solutions tend to be very vulnerable to the financing models applied and are often not distributed more widely because of institutional barriers.

Involving the archaeological community

Involving the wider archaeological community in working with linked data and machine learning requires showing that the advantages far outweigh the investment in understanding and applying the technologies involved. This proof should come from a careful comparative evaluation of these technologies in controlled experiments on a wide range of archaeological information types and archaeological projects, in a study that not only shows that the semantic models can correctly describe information, but that there is also a distinct advantage to this when compared to a baseline of common digital methods of representing archaeological information. Training can be organized through international initiatives, and should involve collaboration with computer scientists, as undertaken during the ARIADNE project. An organization like Computer Applications & Quantitative Methods in Archaeology (CAA 2019) could potentially play an important role in this.

Involving data owners and curators

Existing approaches to data collection and curation are ill-suited to the digital age. Compartmentalised roles of data creator and curator and archivist hinder maximising the potential of born digital data in general and the need to collate spatial data from multiple sources is recognised. The bigger map of the archaeological landscape has been lost to the immediacy of project-led archaeology lacking consistent data standards. Harnessing the potential of spatial data requires a collaborative approach through developing networks across the profession to help exploit the full potential of digital databases so that the data can be shared with other policy domains and linked to societal challenges – all sub-themes within the Amersfoort Agenda (Europae Archaeologiae Consilium 2015: 15–23). Doing so requires redesigning existing approaches to make them work more effectively. Digital data demands that it is collected once, combined seamlessly from different providers, maintained and published at the level where this can be done most effectively so that the data is findable, accessible, interoperable and re-usable.

The need for governance

The systemic failure to move towards a landscape approach for archaeological data was highlighted in a 2015 report by the Horizon 2020 Expert Group on Cultural Heritage (European Commission Directorate-General for Research and Innovation 2015) which found that spatial data about the historic environment should be at the heart of good decision making but is noticeable by its absence. The report identified the need to shift from ‘an object-oriented approach towards a spatial approach in heritage planning’ and to ‘consider cultural landscapes early as part of land use and spatial planning processes’ to get cultural heritage data to work for’.

Successful delivery of such a vision requires breaking down data silos to develop a sustainable infrastructure collating, maintaining and distributing data effectively. Authoritative datasets, such as designations, tend to follow well-defined schemas within national jurisdictions but are not easily interoperable across boundaries. Primary data collection from fieldwork is generally not standardised and requires harmonising across contributors. This raises the question of whose role is it to pull together these data which may eventually be deposited in a monument record or an archive. Organisational structures of archives adopt a hierarchical approach to tidying specific projects into folders within specific collections and are not open to re-engineering and amalgamating data into datasets let alone publishing that data online through a range of web services.

Delivery requires co-ordination through a thematic Spatial Data Infrastructure setting out the framework, policies, standards etc. to deliver the value from the wealth of spatial data in archaeology (McKeague et al. 2017). Yet that coordination across organisations and territories is conspicuous by its absence.

Conclusion

As this paper shows, the potential value of spatial data in heritage is currently not being realised. Beyond the fundamental issue of ensuring the long-term preservation of digital data in general, there is a lack of recognition of the value and potential of spatial data held in reports and datasets, notwithstanding the obvious benefit of standardising and sharing spatial data for research and to inform environmental policies and activities that may impact the cultural heritage. The technical solutions exist from the Discovery Metadata, Web Map Services and Web Feature Services mandated by the INSPIRE Directive or through Linked Data approaches. However, solutions require capital investment in delivering change and a commitment to maintain services well into the future. With Linked Data, technical barriers constrain their implementation outside research institutes.

The lack of coordination in creating a sustainable future for spatial data – actual or legacy – is a major challenge for archaeology information management. The data-sharing envisaged by the Valletta convention remains limited mostly due to fear of looting, intellectual property issues and client confidentiality. Contrast the lack of coordination in archaeology with the approach adopted by geological sciences where the need for, and value of, a standardized approach to spatial data was recognized and addressed from the early 2000s (Jackson 2007). Now over ten years old, the One Geology portal (One Geology 2019) provides access to standardized spatial data from 113 countries. Multi-national data sharing initiatives, including Pelagios, ARIADNE and ArkeoGIS, are very much the exception in archaeology.

The INSPIRE Directive should form a blueprint for developing an SDI for cultural heritage datasets to demonstrate the need for and potential of a coordinated approach not only for archaeologists but the wider society. This will inform the evidence base to develop a manifesto for implementing a spatial data strategy for archaeological datasets. One channel for preparing a manifesto and strategy would be establishing a working group on spatial data under the auspices of the EAC, ideally jointly with the EAA and Computer Applications and Quantitative Methods in Archaeology.

Acknowledgements

This paper developed from presentations held by the authors at the EAA2017 conference in Maastricht (30 August – 2 September 2017) in the session “Mapping our heritage. Towards a sustainable future for digital spatial information and technologies in archaeological heritage management”, organized by Philip Verhagen, Niels van Manen, Loup Bernard and Isto Huvila. Some of the presentations are available online (EAA 2019). Verhagen and Van Manen gratefully acknowledge the support of CLUE+, the Research Institute for Culture, Cognition and Heritage at Vrije Universiteit Amsterdam in organising this session. This paper is further based upon work from the COST Action ARKWORK (2019), supported by COST (European Cooperation in Science and Technology, www.cost.eu). Huvila’s work was partially supported by the Archaeological Information in the Digital Society project (ARKDIS) project funded by the Swedish Research Council Grant 340-2012-5751.

Competing Interests

The authors have no competing interests to declare.

DOI: https://doi.org/10.5334/jcaa.23 | Journal eISSN: 2514-8362
Language: English
Submitted on: Dec 3, 2018
Accepted on: Apr 17, 2019
Published on: Jun 7, 2019
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2019 Peter McKeague, Rein van‘t Veer, Isto Huvila, Anne Moreau, Philip Verhagen, Loup Bernard, Anwen Cooper, Chris Green, Niels van Manen, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.