(1) Overview
Repository location
Context
DigitalSEE (Digital South-Eastern Europe) is a comprehensive virtual repository encompassing various images (woodcuts, engravings, maps) and textual collections, including travelogues, diplomatic reports, newspapers, journals, and archival documents. The project engages in meticulous source tracing and authenticity analysis of artifacts and monuments, utilizing the archives of distinguished figures such as Felix Kanitz (Kanitz, 1868; 1877), Karel Škorpil, and Konstantin Ireček (Jireček, 1877). The primary objective is to document ancient, medieval, and Ottoman heritage through an exhaustive cross-referencing of 18th–19th century travelogues, Ottoman documents, architectural structures, and archaeological discoveries.
By examining European travel writings, diplomatic records, and cartographic interests from the 18th and 19th centuries, the project aims to elucidate the complex history of the Balkans, including the Eastern Question and the processes of nation-building in Southeastern Europe. The initiative seeks to preserve and disseminate historical information about 19th-century Bulgaria through modern technology.
The central aim is to conduct a diachronic study of the development of cultural heritage and identity in the Balkans during the 18th and 19th centuries, with a particular focus on the (Kabadayi, et al. 2022) and the Lower Danube Region (Moesia) (Vezenkov, 2017). The research endeavours to reinterpret the significance of Balkan heritage and identity, influenced by Enlightenment ideas and the rise of nationalism. By integrating modern technologies and artificial intelligence, the project seeks to address questions concerning Balkan and Bulgarian heritage, exploring how these identities have been “translated” over time.
This project is in its early stage of development. We are planning to expand the data set further, focusing on refining the data model and improving the DigitalSEE platform’s access and usability.
(2) Method
Data Model and Standards
The dataset is designed to manage historical and archaeological data, incorporating geographical coordinates, site descriptions, dating criteria, and provenance information, adhering to the TEI EpiDoc standard (Elliott et al., 2006–2022). The dataset is currently formatted in custom-built non-proprietary XML (Bullock et al. 2019) and JSON master files for efficient file management (Preston, 2021), with the XML model tailored to handle data on movable and immovable objects. Since the source material is mainly textual, we plan on employing the TEI XML format (TEI Consortium, 2023). We are also mindful of the International Committee for Documentation Conceptual Reference Model (CIDOC CRM) international standard for cultural heritage data (Doerr, 2003; Faraj et al., 2021; ISO, 2023). In future iterations of the dataset, we aim to include options for exporting data in both standards to improve interoperability, facilitate reuse, and support integration into larger international projects focused on data from the Early Modern period.1
Steps
Research Design and Source Tracing: The project team members research a topic or place of interest and find relevant information in historical sources, such as travelogues and newspaper articles. The information is authenticated through source tracing and cross-referenced with modern data.
Software: We have developed a custom Python Flask (Simeonov, 2024a) web application. The application allows researchers to submit data through a structured form, ensuring the consistency of the dataset. It is currently available on GitHub and Zenodo, allowing researchers to replicate the project’s results.
Data Visualization: The DigitalSEE front-end platform is specifically tailored to represent the textual and geographic data for sites found within the region of interest. The platform also features robust content management functionalities, including an administration panel, user management system, error reporting, image editing and enhanced search capabilities using different keywords and filters (DigitalSEE, 2024). The project also utilises a simpler data visualization tool hosted on GitHub and connected to a HuggingFace space for in-browser visualization (Simeonov, 2024b).
Sampling strategy
The sampling strategy emphasises source tracing, authenticity analysis of textual, and visual content related to the perception and reception of artifacts and monuments. Regarding sourcing, the information we seek is often dispersed across various travelogues to the Orient, and archival and cartographic materials. The primary criterion for selecting these sources is that they are texts from the 16th to the 19th centuries, with particular emphasis on those from the 18th to the 19th centuries, and that they contain descriptions of the European territories of the Ottoman Empire and the Balkans.
The information is gathered from travel literature (Gruber, 2022), predominantly travelogues in the Orient published in the 16th to 18th centuries (extensive collections of old-printed books), but also manuscripts, published critical sources, engravings, and archival material. The archival data from the 19th century, through research and analysis of the Felix Kanitz, Karel Škorpil, and Konstantin Ireček archival documents, play a pivotal role in provenance research and in analysing the reliability of the information in sources from the 16th to 18th centuries.
The creation of the database, such as other similar digital initiatives, includes the identified travelogues and supplementary information such as the travellers’ names and places of origin, and intertextual relationships between the publications. The present dataset is a sampling of approximately 25 travelogues from the 17th to 18th centuries, including works by Mary Montague, Gerhard Cornelius Driesch, William Macmichael, Georg Christoph von Neitzschitz, and Conrad Jacob Hiltebrandt, to identify and document ancient, medieval, and Ottoman heritage, encompassing both movable and immovable artifacts.
Quality control
The most important route in our previously mentioned Flask web application is specified by the decorator “/submit,” which handles both GET and POST server requests and is a quality assurance tool. Because the route has multiple input fields representing various aspects of a historical site or an archaeological object, some are required and must be filled in a prespecified manner according to chosen conventions (Table 1). We opted to use dropdown menus and checkboxes to increase the consistency of the dataset. In the table below, some rows with a required field contain additional asterisks: one asterisk indicates that the field is required if the provenance of the artifacts is known, while two asterisks signify that the current location of the artifact is also known to us.
Table 1
XML Tags for Historical Sites and Objects used in the DigitalSEE project.
| XML TAGS/KEY | DESCRIPTION | REQUIRED |
|---|---|---|
| author | Name of the team member who is the author of the information | yes |
| nameSource | Name of the site/object according to the source | yes |
| nameContemporary | Contemporary name of the site/object (if applicable) | yes |
| description | Description of the site/object (form, dimensions, etc.) | no |
| provenanceOrigin | Information from the source where the site/object was originally found | no |
| geographicCoordinates | Geographic Coordinates | yes |
| latitude | Latitude | yes |
| longitude | Longitude | yes |
| geonamesLink | Reference link to GeoNames | yes |
| pleiadesLink | Reference link to Pleiades | no |
| date | Dating of the site/object according to the source | yes |
| datingCriteria | Dating Criteria | no |
| localizationSource | Localization Source | no |
| localizationCertainty | Localization Certainty | no |
| age | Age according to the Source (Prehistory, Iron Age, Roman Age, Late Antiquity, Middle Ages, Ottoman Period) | no |
| provenanceObservedIn | Subsequent information where an object was observed | no |
| geographicCoordinatesObserved | Geographic coordinates associated with the subsequent places where the object was observed | yes* (*If there is the provenance of the artifacts) |
| latitudeObserved | Latitude | yes* |
| longitudeObserved | Longitude | yes* |
| geonamesLinkObserved | Subsequent reference link to GeoNames | yes* |
| pleiadesLinkObserved | Subsequent reference link to Pleiades | no |
| dateObserved | Dating of the object according to subsequent information | no |
| datingCriteriaObserved | Dating Criteria | no |
| provenanceOtherLocations | When there are other subsequent places where the object was observed | yes* |
| latitudeOther | Latitude | yer* |
| longitudeOther | Longitude | yes* |
| geonamesLinkOtherLocations | Subsequent reference link to GeoNames | yes* |
| dateOtherLocations | Date | yes* |
| datingCriteriaOtherLocations | Dating of the object according to other subsequent information | no |
| currentLocation | Current location of the object, e.g. museum repository | yes** (**If the location is known to us) |
| geographicCoordinatesCurrent | Geographic coordinates associated with the current location of the object | **yes |
| latitudeCurrent | Latitude | **yes |
| longitudeCurrent | Longitude | **yes |
| geonamesLinkCurrent | Reference for the current location of the object in GeoNames | **yes |
| desc | Reference to predetermined categories (Inscriptions, Manuscripts, Cult sites, Communications, Fortifications, Ancient monuments, Other) | yes |
| list | Subcategory to the predetermined categories | yes |
| informationDates | Start and End Dates of the Information | no |
| startDate | Start date of the Information, if applicable | no |
| endDate | End date of the information, if applicable | no |
| ageContemporary | Age to which the site/object relates to according to modern sources | no |
| originalLanguage | Original Language of the Source | yes |
| publicationLanguage | Language of the Publication read by the team member | yes |
| sourceInformation | Bibliographic/Archival Information for the Source | yes |
| annotation | Commentary from a team member regarding the site/object | no |
| keyword | Keywords used for sorting the files | yes |
| sourceContent | Excerpt or quote from the source | no |
| copyrightStoragePlace | Copyright/storage place of the source | yes |
| viaf | VIAF | no |
| iiif | IIIF | no |
| authorPublication | Author of the source information (travelogue, manuscript, etc.) | no |
[i] The table presents XML tags with descriptions and shows which are required and for recording details about historical sites and objects in the DigitalSEE project.
(3) Dataset Description
Repository name
DigitalSEE (Digital South-Eastern Europe)
Object name
Bestroi150-DigitalSEE-9abf9c4, containing subfolders titled JSON, XML-EN, XML-BG.
Format names and versions
The dataset is available in XML and JSON master files.
Creation dates
2024-04-24
Dataset creators
Maria Baramova (leading researcher), Nicolay Sharankov (researcher), Dimitar Iliev (researcher), Ivan Parvev (researcher), Chavdar Kirilov (researcher), Ivan Valchev (researcher), Vania Racheva (researcher), Kristiyan Simeonov (researcher) (All the researchers are from Sofia University “St. Kliment Ohridski”).
Language
The variables in the dataset are named in English. The dataset specifies information about the source’s original and publication languages with an attribute that denotes language codes according to the ISO 639-2 standard. Direct quotes or excerpts from primary German, Latin, and French sources are also available under the element sourceContent.
License
CC BY 4.0
Publication date
2024-09-02
(4) Reuse Potential
The dataset has the potential to be reused by scholars in fields such as history, archaeology, and geospatial studies. One critical reuse potential of the dataset is the application of topic modelling on the sourceContent element using Python algorithms. This process would assign weights to words, helping to explore how specific concepts related to the objects of interest are formulated (Zhang et al., 2015). Additionally, elements containing geographical coordinates, descriptions, and dates can be processed with Python libraries such as pandas and exported into a CSV file. This file can be further analysed from various perspectives using GIS-based software, such as QGIS or ArcGIS, and integrated into more significant research projects focused on the Early Modern period. This approach could lead to more in-depth network analysis.
A limitation of the research data pertains to the reception and interpretation of historical objects from the Antiquity and Medieval periods. These objects, which are explicitly referenced within the research materials, serve as a foundational basis for provenance research.
Notes
[2] See: A project of the Austrian Academy of Sciences: https://travelogues.github.io/; a project of the Berlin-Brandenburg Academy of Sciences and Humanities: https://thesaurus.bbaw.de/de.
Funding Information
This study is financed by the European Union-NextGenerationEU, through the National Recovery and Resilience Plan of the Republic of Bulgaria, project No BG-RRP-2.004-0008.
Competing Interests
The authors have no competing interests to declare.
Author Contributions
Maria Baramova: Conceptualization; Investigation; Writing – original draft
Kristiyan Simeonov: Investigation; Data curation; Software; Writing – original draft
