Open Digital Data on Funerary Landscapes and Settlement Organisation in Pre-Roman Apulia: Monte Sannace and Vaste

Dominik Hagmann; Matthias Hoernes

doi:10.5334/joad.209

(1) Overview

Context

This paper presents the complete archaeological research data underlying the book chapter “Intra muros et extra: Funerary landscapes and settlement dynamics at Monte Sannace and Vaste in the fourth century BCE” [1]. It makes the “Intra muros et extra” (IMEE) dataset openly available and shares the project’s digital methodology.

The project examined the transformative period of late Classical and early Hellenistic Apulia through a detailed analysis of residential and burial spaces. In the late fourth to early third centuries BCE, rapid settlement growth, urbanization, the construction of fortification walls, and the infilling of the countryside reshaped the landscape of southern and central Apulia. These developments redefined relationships between residential and burial areas, producing significant yet locally varied changes in funerary landscapes. Focusing on Monte Sannace and Vaste as case studies, the book chapter [1] analyzed a qualitative and quantitative spatial dataset of 379 funerary features, contextualized within their settlements and broader surroundings through a comprehensive review of related scholarship [2] (Figure 1).

Geographic location and spatial distribution of funerary features at Monte Sannace and Vaste (Apulia, Italy). **(a)** Global and regional location of the study area; **(b)** position of Monte Sannace and Vaste within southern Italy; **(c)** distribution of funerary features at Monte Sannace; **(d)** distribution of funerary features at Vaste. (D. Hagmann 2026; data: Natural Earth, Orthophoto [2019] acquired from the Italian Agency for Agricultural Payments [AGEA] via the PugliaCon Spatial Information System [SIT]).

The selected sites are ideal for analysis, given their rich archaeological record, extensive documentation, and high-quality publications. As the communities in Apulia followed diverse, locally specific patterns when reshaping their residential and funerary landscapes, the sites only serve as test cases. They are considered a pilot study for scaling the analysis up to more than 40 fortified sites in the region which have a very heterogeneous state of research and are sometimes primarily known from grey literature.

The dataset presented here records 379 funerary features from Monte Sannace and Vaste, combining individual tombs and clustered funerary contexts in a single reusable table. Each record is identified by a unique numerical identifier (ID) and includes information on site, deposition type, tomb type, spatial setting, chronology, burial counts, mapping method and quality, notes, and bibliographic references. Of these records, 347 represent individual funerary features and 32 represent clustered features; 290 records belong to Monte Sannace and 89 to Vaste. The dataset further records 367 burials where dedicated counts are known, representing at least 474 buried individuals. These include 286 funerary features (at least 379 individuals) at Monte Sannace and 81 features (at least 95 individuals) at Vaste. 12 records (Monte Sannace: 4; Vaste: 8) have unknown counts because no values are published. In most cases, anthropological analyses, such as those concerning the minimum number of individuals buried, are missing. This is particularly problematic given the widespread reuse of tombs in the region. Spatial information is provided for 340 records, while the remaining 39 records are retained with NULL values to preserve the full evidentiary base. Supporting files include a data and value dictionary, CRS metadata table, KML derivative for visualization, and bibliography.

Spatial coverage (EPSG:4326)

Description: Monte Sannace and Vaste are both major archaeological sites in southern Italy, located in Apulia. Monte Sannace lies near Gioia del Colle in the Metropolitan City of Bari, while Vaste, also known as Vaste di Poggiardo, is situated near Poggiardo in the province of Lecce.

Northern boundary: 40.8403521923376971

Southern boundary: 40.0315152839819035

Eastern boundary: 18.4013056221726998

Western boundary: 16.9602188454462990

Temporal coverage

Both archaeological sites cover multiple periods from the region’s pre-Roman and Roman phases, with most of the residential and funerary data originating from the fourth and third centuries BCE. The presented dataset covers archaeological features from around 700/600 BCE to around 300/250 BCE. During this period, the region was inhabited by Italic communities known as the Peucetians in central Apulia and the Messapians in the south of the region. The dates provided are mostly based on typochronologies, primarily of pottery, and are taken from the excavation publications of the sites (for full references see the bibliography in the dataset).

(2) Methods

For the management of digital archaeological information and spatial data analysis, we took a straightforward approach by pragmatically using both proprietary and free and open-source (FOSS) software [3].

All data was securely stored in a cloud by using Google Drive. We then adapted the data for analysis in a geographic information system (GIS), where it was also centrally managed. The qualitative dataset was subsequently loaded as non-spatial tables in CSV (Comma-Separated Values) file format into the FOSS-GIS software QGIS (3.30.2-’s-Hertogenbosch), installed on a Windows 11 Pro device. Spatial point data stored in ESRI Shapefiles (SHP and auxiliary files) provided all the quantitative-technical information on our area of interest.

We used open geodata as the underlying basis. For the terrain model and related derivations, we used the TINITALY dataset with a resolution of 10 × 10 m and 100 × 100 m respectively [4]. For administrative units, we used data provided by the European Commission’s GIS (GISCO) from EuroGeographics/Eurostat, as well as data provided by the Italian Geoportale Nazionale (Ministero dell’Ambiente e della Tutela del Territorio e del Mare).

For open-access data dissemination and long-term data archiving [3], data was exported to CSV and KML (Keyhole Markup Language) files; supplementary documentation is provided in MD (Markdown) format. All data has been made available through the Zenodo repository (infra).

Multiple coordinate reference systems (CRS) were used [5]: The source-point coordinates were mapped in EPSG:3004 – Monte Mario/Italy zone 2, the historical Italian Gauss-Boaga CRS used for eastern Italy and appropriate for Apulia. Furthermore, to improve interoperability, the dataset also provides transformed coordinate pairs in EPSG:4326 – WGS 84, EPSG:32633 – WGS 84/UTM zone 33N, and EPSG:25833 – ETRS89/UTM zone 33N. Firstly, EPSG:4326 is included to ensure compatibility with web-mapping environments and is also used for the KML derivative. Secondly, EPSG:32633 is provided because of its wide use as a projected CRS and its broad support in GIS software. Thirdly, EPSG:25833 is included as the preferred CRS for many recent European geodata workflows, as it is based on ETRS89 and is therefore well aligned with continental-scale European spatial reference standards.

Steps

We systematically collected cultural data based on the secondary literature and structured it in Excel/Microsoft 365 tables (in XLSX file format). To ensure sustainable data management, we established a well-organized file folder structure. In addition to our own data collection, we incorporated archaeological datasets from previous studies on Monte Sannace and Vaste. These datasets were kindly provided by the respective authors [6, 7] and served as valuable secondary data sources (infra). After documenting all identifiable tombs based on the secondary literature, we mapped these tombs as point data in QGIS using the qualitative data we had generated. Finally, we joined the spatial and qualitative data to generate the full IMEE dataset which has then been uploaded to Zenodo in CSV (and KML) format.

Quality Control

To ensure the accuracy and reliability of our dataset, we implemented rigorous quality control measures: We conducted repeatedly thorough checks and validations to examine the consistency and coherence of the data. We also repeatedly reviewed the cultural-qualitative tables to ensure that the information was accurately recorded and categorized appropriately. Throughout the process, we maintained detailed documentation of our quality control procedures and any adjustments made to the dataset.

Constraints

As secondary data [8], this dataset offers several benefits. It has already been collected, saving time and resources. It provides a broad geographic and temporal scope, enabling analyses that might not initially be feasible with primary data collection. However, users of this secondary data should also be aware of potential limitations: They have no control over how the data was initially collected, so they must trust the accuracy and reliability of the original source as mapped and analyzed by the respective authors.

(3) Dataset description

The dataset consists of nine files and combines an authoritative analytical table, interoperable spatial derivatives, machine-readable and human-readable documentation, and structured bibliographies. Its structure is designed to support both direct archaeological analysis and reuse in GIS, database, repository, and publication workflows.

Object name

README.md – overview of the dataset package, spatial data, conventions and recommended use.
IMEE_dataset.csv – consolidated primary table containing 379 funerary feature records from Monte Sannace and Vaste.
IMEE_dataset.kml – derivative KML file for visualizing the 340 spatially located records.
IMEE_data_dictionary.csv – field-level documentation for the primary table.
IMEE_value_dictionary.csv – explanations of controlled values and missing-value convention.
IMEE_coordinate_reference_systems.csv – metadata for the coordinate reference systems used in the dataset.
IMEE_references.bib – bibliography in BibTeX format.
IMEE_references_readable.csv – human-readable bibliography table.
IMEE_references_readable.txt – human-readable bibliography text file.

Data type

The central file is IMEE_dataset.csv, a CSV table containing the consolidated analytical dataset. It brings together in one main table the individual and clustered funerary contexts of Monte Sannace and Vaste. Each record is identified by a unique ID. The main table contains spatial, qualitative and quantitative archaeological information. The recorded fields include the site, deposition type, excavation or discovery year, feature number, tomb count, tomb type, spatial zone, sector and location, location quality, coordinate values, chronological information, dating method, burial count, burial-related notes, mapping method and bibliographic references.

Spatial coordinates are included directly in IMEE_dataset.csv. The source coordinate pair is provided in EPSG:3004, Monte Mario/Italy zone 2. To improve interoperability, the dataset also includes transformed coordinate pairs in EPSG:4326, WGS 84, EPSG:32633, WGS 84/UTM zone 33N, and EPSG:25833, ETRS89/UTM zone 33N.

The CRS names, datums, units, axis order and explanatory notes for these coordinate columns are documented in IMEE_coordinate_reference_systems.csv, another CSV file. The derived coordinate pairs were calculated from EPSG:3004 with pyproj/PROJ EPSG operations; numerical coordinate precision should therefore not be confused with archaeological location certainty, which is documented separately through the Location_quality and Mapping fields.

For missing data, the value NULL is used throughout the CSV files [9]. It marks values that are unknown or not applicable and should be treated as a literal missing-value marker, not as a textual observation.

IMEE_dataset.csv contains the following fields:

ID is the stable unique integer identifier assigned to each record. It harmonizes the former tomb, cluster and feature identifiers and allows each record to be referenced consistently across the dataset package.
Site identifies the archaeological site to which the record belongs. The values used are Monte Sannace and Vaste.
Deposition_type defines the analytical granularity of the record. It distinguishes between individually recorded funerary features and clustered features.
Year records the excavation, discovery or publication year as reported in the relevant source. The field is retained as text because some entries include approximate values, ranges or source-specific chronological formulations.
Feature_no records the original tomb number, feature number or label assigned in the source documentation or secondary literature. Where no such number is available, the value is recorded as NULL.
Tomb_count gives the minimum number of tombs represented by the row. Individually recorded features are counted as one, whereas clustered features retain the reported minimum number of tombs where this information is available.
Tomb_count_qualifier qualifies the value in Tomb_count. The value at least indicates that the recorded number is a minimum count rather than a complete or certain total. NULL is used where no qualifier is required or applicable.
Tomb_type describes the tomb or deposition type, where this information is available. The field records controlled textual values derived from the archaeological sources and may include values such as sarcophagus, cist grave, pit grave or other source-specific tomb forms.
Tomb_zone records the broadest spatial unit within the site. It is used for larger spatial divisions and is – within a simple hierarchical structure – treated as broader than Tomb_sector.
Tomb_sector records a more detailed, named sector, property, excavation area or sub-area within the broader zone. It provides a more precise spatial attribution where such information is available in the source material.
Tomb_location provides a specific location information either within a sector or zone, usually in the form of a descriptive reference to a particular place, plot, excavation context, street, building, property or mapped area.
Location_quality evaluates the spatial accuracy of each record according to a simple hierarchical structure. The value exactly located indicates a high level of accuracy, where the available source information allows the feature to be placed at or very close to its recorded position. Contextually located indicates a medium level of accuracy, where the feature can be assigned to a documented archaeological context, but not with feature-level precision; this placement has been mostly derived from broader descriptions or the overall archaeological context. Zonally located indicates a low level of accuracy, where only a broader zone, sector, or property is known. Conflicting information marks cases in which published spatial information is inconsistent across sources.
Monte_Mario_EPSG3004_Easting records the projected easting coordinate in EPSG:3004, Monte Mario/Italy zone 2. This is the source Gauss-Boaga coordinate pair used for the dataset. NULL indicates that no reliable point coordinate is available.
Monte_Mario_EPSG3004_Northing records the projected northing coordinate in EPSG:3004, Monte Mario/Italy zone 2. Together with the corresponding easting value, it forms the original projected coordinate pair.
WGS84_EPSG4326_Longitude records the longitude in EPSG:4326, WGS 84, expressed in decimal degrees. This coordinate is derived from the EPSG:3004 source coordinates and is used by the KML derivative.
WGS84_EPSG4326_Latitude records the latitude in EPSG:4326, WGS 84, expressed in decimal degrees. Together with the longitude field, it allows the data to be visualized in web-mapping environments and virtual globe applications.
WGS84_UTM33N_EPSG32633_Easting records the projected easting coordinate in EPSG:32633, WGS 84/UTM zone 33N. This value is derived from the EPSG:3004 source coordinates and supports reuse in widely supported GIS workflows.
WGS84_UTM33N_EPSG32633_Northing records the projected northing coordinate in EPSG:32633, WGS 84/UTM zone 33N. Together with the corresponding easting value, it provides a projected coordinate pair based on the global WGS 84 datum.
ETRS89_UTM33N_EPSG25833_Easting records the projected easting coordinate in EPSG:25833, ETRS89/UTM zone 33N. This value is derived from the EPSG:3004 source coordinates and is included for compatibility with contemporary European geodata workflows.
ETRS89_UTM33N_EPSG25833_Northing records the projected northing coordinate in EPSG:25833, ETRS89/UTM zone 33N. Together with the corresponding easting value, it provides a European-standard projected coordinate pair.
Date_start_BCE records the older boundary of the chronological range in BCE. Where composite source values were present, they were normalized and split into numeric start and end fields; the numeric value uses the first or older alternative where applicable.
Date_end_BCE records the younger boundary of the chronological range in BCE. It defines the later end of the normalized date range.
Date_phase records the broad chronological phase code assigned to the record. Multiple phase codes separated by semicolons indicate that the dating spans more than one chronological phase. The value 0 thereby indicates that no phase was assigned because the record is undated, too broadly dated or not documented. The value 1 refers to the earlier or pre-350 BCE phase used in the classification. The value 2 refers to the ca. 350–300 BCE phase. The value 3 refers to the later fourth- to third-century BCE phase. The value 4 refers to the later Hellenistic phase.
Dating_method records the basis on which the date was assigned; the value literature-based periodization means that dating derives from chronological interpretation reported in the cited literature. The value not documented means that no explicit dating information is recorded in the dataset. This is a categorical value, not a missing-value marker.
Date_note preserves the original composite dating information and any relevant normalization note. It functions as a transparent link between the source wording and the normalized numeric date fields.
Tomb_burials records the minimum number of buried individuals represented by the funerary feature or cluster. Age-related wording and uncertainty information are stored in Tomb_burials_note.
Tomb_burials_qualifier qualifies the value in Tomb_burials. The value at least marks a minimum number, while uncertain indicates that the burial count is not secure. NULL is used where no qualifier is required or where the number of burials is unknown or not applicable.
Tomb_burials_note preserves the original wording relating to burial count, age attribution or uncertainty. This field especially retains source-specific information that cannot be represented adequately as a simple numeric value.
Notes contains additional source notes, observations or contextual information that do not fit into the other structured fields.
Mapping describes how the spatial point was generated or why no point could be generated; the value located based on map means that the point coordinates were derived from a published map. Located based on orthophoto means that coordinates were generated through orthophoto interpretation. Roughly located based on map indicates an approximate point derived from a published map. Roughly located based on Carta archeologica indicates an approximate point derived from the officially published regional archaeological gazetteer. Exactly located based on Carta archeologica indicates that the feature could be placed precisely on the basis of the registry. Contextually located means that the point represents a more broadly documented archaeological context rather than the exact position of an individual feature. Not located means that no usable point coordinate is available. To be located marks records that were flagged for later georeferencing in the original data.
References contains short human-readable references for the record. Expanded bibliographic data are provided separately in IMEE_references.bib, IMEE_references_readable.csv and IMEE_references_readable.txt.

In addition to the main CSV table, the dataset includes IMEE_dataset.kml, a KML file for rapid spatial visualization in GIS software and virtual globe applications such as Google Earth. This derivative file contains the 340 spatially located records and uses the EPSG:4326 longitude and latitude columns. Each KML placemark uses the dataset ID as its identifier, includes a short descriptive name, and provides selected information from the main table in the placemark description. The KML file is intended for exploratory visualization and spatial orientation; however, IMEE_dataset.csv remains the authoritative analytical table.

Three further documentation files support interpretation and reuse of the tabular data: IMEE_data_dictionary.csv is a CSV data dictionary describing every field in the main table, including its meaning, expected format or controlled values, and relevant notes. IMEE_value_dictionary.csv, also provided as CSV, explains controlled values such as Date_phase, Location_quality, Mapping and Dating_method, and documents the NULL convention.

Bibliographic information is provided in three complementary forms: IMEE_references.bib is a BibTeX file containing the structured reference data used by the dataset and is intended for citation management and reproducible referencing. For easier reading outside reference-management software, the same bibliography is also supplied as IMEE_references_readable.csv in CSV format and as IMEE_references_readable.txt in plain text format. The CSV version allows the references to be filtered, searched or reused in tabular workflows, whereas the plain-text version provides a simple human-readable reference list.

Furthermore, README.md provides a detailed overview of the dataset package within the file set itself. Hence, it documents the spatial data, file-naming conventions, missing-value encoding and recommended use of the dataset.

Last but not least, there is a public group library available on Zotero. This serves as a centralized repository for storing and organizing references and sources related to the project. It enables multiple researchers to collaborate in managing and accessing the collection of references, thereby facilitating collaboration and knowledge sharing while ensuring that all interested researchers have access to the relevant literature and resources. The group library on Zotero therefore provides a convenient and efficient way to maintain a comprehensive and up-to-date collection of references for the IMEE dataset: https://www.zotero.org/groups/5069471/intra_muros_et_extra.

Format names and versions

BIB; CSV; KML; MD

Creation dates

From 21/12/2021 to 31/05/2026.

Language

English, Italian.

License

https://creativecommons.org/licenses/by/4.0/

Repository location

https://doi.org/10.5281/zenodo.20451742

Publication date

31/05/2026

(4) Reuse potential

Overall, the dataset offers a reusable and scalable basis for future archaeological, bioarchaeological, and computational research on funerary landscapes in Apulia and beyond. It supports comparative analysis, interdisciplinary integration and teaching, while combining a FAIR-oriented technical structure – supporting findability, accessibility, interoperability and reusability – with a CARE-informed awareness of collective benefit, responsibility and ethical-contextual sensitivity [3, 10].

The structured CSV format, standardized metadata and spatial coordinates make the dataset readily reusable in GIS, relational databases, statistical software and reproducible research workflows. Its main value lies in its function as a structured registry of funerary sites and features from Monte Sannace and Vaste in the fourth century BCE. As the dataset has been developed in the context of the book chapter “Intra muros et extra: Funerary landscapes and settlement dynamics at Monte Sannace and Vaste in the fourth century BCE” [1], it already provides an in-depth test case for a data model that can be used for various forms of social-archaeological analysis and could be scaled to a broader regional level in future holistic projects including additional sites from Apulia. As shown in [1], possible applications include mapping intra- and extra-mural burial patterns, assessing the distribution of graves in relation to walls, roads, gates, settlement areas or topographical features, and comparing funerary clustering between Monte Sannace and Vaste. Consequently, further applications could include density mapping, nearest-neighbour analysis, distance-based modelling, chronological mapping of burial activity and comparative studies of settlement–cemetery relations across multiple sites.

More specifically, for archaeology, the dataset offers a comparative basis for studying urbanization, funerary landscapes, social organization and regional settlement dynamics and can be combined with further excavation data, survey results, ceramic studies, architectural documentation and legacy gazetteers.

The dataset is also relevant for bioarchaeologists because it provides a spatial and contextual framework to which osteological, palaeopathological, isotopic, aDNA and other biomolecular data can be linked. Even if such data are not yet included in detail, the registry can serve as a backbone for future analyses of, where available, age-at-death, sex estimation, stature, trauma, pathology, diet, mobility or biological relatedness. This would allow researchers to examine, for example, whether specific demographic groups were buried in particular areas, whether non-local individuals cluster spatially, or whether burial location, grave architecture, grave goods and biological profiles are correlated.

The reuse potential is further strengthened by the dataset’s alignment with FAIR and CARE principles [10]. Its structured format, explicit coordinates, standardized fields and documentation support FAIR-aligned data use. At the same time, the data should be handled in accordance with CARE-oriented considerations, since funerary evidence relates to human remains, burial practices and culturally meaningful mortuary spaces. Reuse should therefore include proper citation, transparent acknowledgement of uncertainties, careful treatment of spatial and chronological limitations and context-sensitive interpretation. Through the documentation of the literary sources and the qualitative evaluation of the available information, the dataset already contributes to such responsible reuse by making the evidential basis and its limitations explicit. This becomes particularly relevant if future expansions integrate osteological, biomolecular or other sensitive bioarchaeological data.

The dataset also has potential for computational applications, especially artificial intelligence (AI). As a structured and spatially explicit registry, it could be used for supervised or semi-supervised classification of funerary contexts, pattern recognition in burial distributions, clustering of graves by spatial, chronological or contextual attributes, and predictive modelling of funerary zones at regional scale. In combination with other datasets, it could support machine-learning (ML) approaches to detect recurring settlement–cemetery configurations, identify outliers or under-documented patterns, and assist in the harmonization of heterogeneous legacy data. Such applications would not replace archaeological interpretation, but could help to formulate new research questions, test regional-scale models and evaluate whether the data structure is suitable for larger AI-assisted analyses of funerary landscapes [11, 12].

Finally, the dataset has clear pedagogical value for teaching archaeology and digital methods. It can be used as an open educational resource (OER) in courses on funerary archaeology, settlement archaeology, GIS, archaeological data modelling and digital humanities to train students in working with real, spatially explicit and imperfect archaeological data. Possible teaching applications include exercises in data cleaning, controlled vocabularies, spatial visualization, uncertainty assessment, intra- and extra-mural burial analysis, and the critical interpretation of funerary landscapes. Because the dataset combines archaeological context, spatial information and structured metadata, it allows students to move from basic data handling to more advanced questions concerning settlement dynamics, mortuary practices, data quality, FAIR/CARE principles and the methodological limits of computational analysis in archaeology.

Acknowledgements

We are very grateful to Giovanni Mastronuzzi and Paola Palmentola, as well as to Marco Di Lieto and Fabio Galeandro, who generously provided us with datasets from previous studies.

Author Contributions

Dominik Hagmann: Conceptualization; Data curation; Methodology; Project administration; Visualization; Writing – original draft; Writing – review & editing. Matthias Hoernes: Conceptualization; Formal analysis; Investigation; Methodology; Writing – original draft; Writing – review & editing.