Have a personal or library account? Click to login
DigitalSEE: A Digital Repository for South Eastern Europe in the 18th–19th Century Cover

DigitalSEE: A Digital Repository for South Eastern Europe in the 18th–19th Century

Open Access
|Nov 2024

Full Article

(1) Overview

Repository location

https://doi.org/10.6084/m9.figshare.26893756.v1

Context

DigitalSEE (Digital South-Eastern Europe) is a comprehensive virtual repository encompassing various images (woodcuts, engravings, maps) and textual collections, including travelogues, diplomatic reports, newspapers, journals, and archival documents. The project engages in meticulous source tracing and authenticity analysis of artifacts and monuments, utilizing the archives of distinguished figures such as Felix Kanitz (Kanitz, 1868; 1877), Karel Škorpil, and Konstantin Ireček (Jireček, 1877). The primary objective is to document ancient, medieval, and Ottoman heritage through an exhaustive cross-referencing of 18th–19th century travelogues, Ottoman documents, architectural structures, and archaeological discoveries.

By examining European travel writings, diplomatic records, and cartographic interests from the 18th and 19th centuries, the project aims to elucidate the complex history of the Balkans, including the Eastern Question and the processes of nation-building in Southeastern Europe. The initiative seeks to preserve and disseminate historical information about 19th-century Bulgaria through modern technology.

The central aim is to conduct a diachronic study of the development of cultural heritage and identity in the Balkans during the 18th and 19th centuries, with a particular focus on the (Kabadayi, et al. 2022) and the Lower Danube Region (Moesia) (Vezenkov, 2017). The research endeavours to reinterpret the significance of Balkan heritage and identity, influenced by Enlightenment ideas and the rise of nationalism. By integrating modern technologies and artificial intelligence, the project seeks to address questions concerning Balkan and Bulgarian heritage, exploring how these identities have been “translated” over time.

This project is in its early stage of development. We are planning to expand the data set further, focusing on refining the data model and improving the DigitalSEE platform’s access and usability.

(2) Method

Data Model and Standards

The dataset is designed to manage historical and archaeological data, incorporating geographical coordinates, site descriptions, dating criteria, and provenance information, adhering to the TEI EpiDoc standard (Elliott et al., 2006–2022). The dataset is currently formatted in custom-built non-proprietary XML (Bullock et al. 2019) and JSON master files for efficient file management (Preston, 2021), with the XML model tailored to handle data on movable and immovable objects. Since the source material is mainly textual, we plan on employing the TEI XML format (TEI Consortium, 2023). We are also mindful of the International Committee for Documentation Conceptual Reference Model (CIDOC CRM) international standard for cultural heritage data (Doerr, 2003; Faraj et al., 2021; ISO, 2023). In future iterations of the dataset, we aim to include options for exporting data in both standards to improve interoperability, facilitate reuse, and support integration into larger international projects focused on data from the Early Modern period.1

Steps

  1. Research Design and Source Tracing: The project team members research a topic or place of interest and find relevant information in historical sources, such as travelogues and newspaper articles. The information is authenticated through source tracing and cross-referenced with modern data.

  2. Software: We have developed a custom Python Flask (Simeonov, 2024a) web application. The application allows researchers to submit data through a structured form, ensuring the consistency of the dataset. It is currently available on GitHub and Zenodo, allowing researchers to replicate the project’s results.

  3. Data Visualization: The DigitalSEE front-end platform is specifically tailored to represent the textual and geographic data for sites found within the region of interest. The platform also features robust content management functionalities, including an administration panel, user management system, error reporting, image editing and enhanced search capabilities using different keywords and filters (DigitalSEE, 2024). The project also utilises a simpler data visualization tool hosted on GitHub and connected to a HuggingFace space for in-browser visualization (Simeonov, 2024b).

Sampling strategy

The sampling strategy emphasises source tracing, authenticity analysis of textual, and visual content related to the perception and reception of artifacts and monuments. Regarding sourcing, the information we seek is often dispersed across various travelogues to the Orient, and archival and cartographic materials. The primary criterion for selecting these sources is that they are texts from the 16th to the 19th centuries, with particular emphasis on those from the 18th to the 19th centuries, and that they contain descriptions of the European territories of the Ottoman Empire and the Balkans.

The information is gathered from travel literature (Gruber, 2022), predominantly travelogues in the Orient published in the 16th to 18th centuries (extensive collections of old-printed books), but also manuscripts, published critical sources, engravings, and archival material. The archival data from the 19th century, through research and analysis of the Felix Kanitz, Karel Škorpil, and Konstantin Ireček archival documents, play a pivotal role in provenance research and in analysing the reliability of the information in sources from the 16th to 18th centuries.

The creation of the database, such as other similar digital initiatives, includes the identified travelogues and supplementary information such as the travellers’ names and places of origin, and intertextual relationships between the publications. The present dataset is a sampling of approximately 25 travelogues from the 17th to 18th centuries, including works by Mary Montague, Gerhard Cornelius Driesch, William Macmichael, Georg Christoph von Neitzschitz, and Conrad Jacob Hiltebrandt, to identify and document ancient, medieval, and Ottoman heritage, encompassing both movable and immovable artifacts.

Quality control

The most important route in our previously mentioned Flask web application is specified by the decorator “/submit,” which handles both GET and POST server requests and is a quality assurance tool. Because the route has multiple input fields representing various aspects of a historical site or an archaeological object, some are required and must be filled in a prespecified manner according to chosen conventions (Table 1). We opted to use dropdown menus and checkboxes to increase the consistency of the dataset. In the table below, some rows with a required field contain additional asterisks: one asterisk indicates that the field is required if the provenance of the artifacts is known, while two asterisks signify that the current location of the artifact is also known to us.

Table 1

XML Tags for Historical Sites and Objects used in the DigitalSEE project.

XML TAGS/KEYDESCRIPTIONREQUIRED
authorName of the team member who is the author of the informationyes
nameSourceName of the site/object according to the sourceyes
nameContemporaryContemporary name of the site/object (if applicable)yes
descriptionDescription of the site/object (form, dimensions, etc.)no
provenanceOriginInformation from the source where the site/object was originally foundno
geographicCoordinatesGeographic Coordinatesyes
latitudeLatitudeyes
longitudeLongitudeyes
geonamesLinkReference link to GeoNamesyes
pleiadesLinkReference link to Pleiadesno
dateDating of the site/object according to the sourceyes
datingCriteriaDating Criteriano
localizationSourceLocalization Sourceno
localizationCertaintyLocalization Certaintyno
ageAge according to the Source (Prehistory, Iron Age, Roman Age, Late Antiquity, Middle Ages, Ottoman Period)no
provenanceObservedInSubsequent information where an object was observedno
geographicCoordinatesObservedGeographic coordinates associated with the subsequent places where the object was observedyes*
(*If there is the provenance of the artifacts)
latitudeObservedLatitudeyes*
longitudeObservedLongitudeyes*
geonamesLinkObservedSubsequent reference link to GeoNamesyes*
pleiadesLinkObservedSubsequent reference link to Pleiadesno
dateObservedDating of the object according to subsequent informationno
datingCriteriaObservedDating Criteriano
provenanceOtherLocationsWhen there are other subsequent places where the object was observedyes*
latitudeOtherLatitudeyer*
longitudeOtherLongitudeyes*
geonamesLinkOtherLocationsSubsequent reference link to GeoNamesyes*
dateOtherLocationsDateyes*
datingCriteriaOtherLocationsDating of the object according to other subsequent informationno
currentLocationCurrent location of the object, e.g. museum repositoryyes**
(**If the location is known to us)
geographicCoordinatesCurrentGeographic coordinates associated with the current location of the object**yes
latitudeCurrentLatitude**yes
longitudeCurrentLongitude**yes
geonamesLinkCurrentReference for the current location of the object in GeoNames**yes
descReference to predetermined categories (Inscriptions, Manuscripts, Cult sites, Communications, Fortifications, Ancient monuments, Other)yes
listSubcategory to the predetermined categoriesyes
informationDatesStart and End Dates of the Informationno
startDateStart date of the Information, if applicableno
endDateEnd date of the information, if applicableno
ageContemporaryAge to which the site/object relates to according to modern sourcesno
originalLanguageOriginal Language of the Sourceyes
publicationLanguageLanguage of the Publication read by the team memberyes
sourceInformationBibliographic/Archival Information for the Sourceyes
annotationCommentary from a team member regarding the site/objectno
keywordKeywords used for sorting the filesyes
sourceContentExcerpt or quote from the sourceno
copyrightStoragePlaceCopyright/storage place of the sourceyes
viafVIAFno
iiifIIIFno
authorPublicationAuthor of the source information (travelogue, manuscript, etc.)no

[i] The table presents XML tags with descriptions and shows which are required and for recording details about historical sites and objects in the DigitalSEE project.

(3) Dataset Description

Repository name

DigitalSEE (Digital South-Eastern Europe)

Object name

Bestroi150-DigitalSEE-9abf9c4, containing subfolders titled JSON, XML-EN, XML-BG.

Format names and versions

The dataset is available in XML and JSON master files.

Creation dates

2024-04-24

Dataset creators

Maria Baramova (leading researcher), Nicolay Sharankov (researcher), Dimitar Iliev (researcher), Ivan Parvev (researcher), Chavdar Kirilov (researcher), Ivan Valchev (researcher), Vania Racheva (researcher), Kristiyan Simeonov (researcher) (All the researchers are from Sofia University “St. Kliment Ohridski”).

Language

The variables in the dataset are named in English. The dataset specifies information about the source’s original and publication languages with an attribute that denotes language codes according to the ISO 639-2 standard. Direct quotes or excerpts from primary German, Latin, and French sources are also available under the element sourceContent.

License

CC BY 4.0

Publication date

2024-09-02

(4) Reuse Potential

The dataset has the potential to be reused by scholars in fields such as history, archaeology, and geospatial studies. One critical reuse potential of the dataset is the application of topic modelling on the sourceContent element using Python algorithms. This process would assign weights to words, helping to explore how specific concepts related to the objects of interest are formulated (Zhang et al., 2015). Additionally, elements containing geographical coordinates, descriptions, and dates can be processed with Python libraries such as pandas and exported into a CSV file. This file can be further analysed from various perspectives using GIS-based software, such as QGIS or ArcGIS, and integrated into more significant research projects focused on the Early Modern period. This approach could lead to more in-depth network analysis.

A limitation of the research data pertains to the reception and interpretation of historical objects from the Antiquity and Medieval periods. These objects, which are explicitly referenced within the research materials, serve as a foundational basis for provenance research.

Notes

[2] See: A project of the Austrian Academy of Sciences: https://travelogues.github.io/; a project of the Berlin-Brandenburg Academy of Sciences and Humanities: https://thesaurus.bbaw.de/de.

Funding Information

This study is financed by the European Union-NextGenerationEU, through the National Recovery and Resilience Plan of the Republic of Bulgaria, project No BG-RRP-2.004-0008.

Competing Interests

The authors have no competing interests to declare.

Author Contributions

Maria Baramova: Conceptualization; Investigation; Writing – original draft

Kristiyan Simeonov: Investigation; Data curation; Software; Writing – original draft

DOI: https://doi.org/10.5334/johd.241 | Journal eISSN: 2059-481X
Language: English
Submitted on: Sep 2, 2024
Accepted on: Oct 2, 2024
Published on: Nov 6, 2024
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2024 Maria Baramova, Kristiyan Simeonov, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.