Structuring Resources for a Database of Balkan Criminality in the Long Nineteenth Century

Roxana Patraș; Constantin Răchită; Mihai-Bogdan Atanasiu; Ana Odochiciuc

doi:10.5334/johd.515

Full Article

(1) Context

During the long nineteenth century, the hajduk fiction was one of the most productive subgenres of the Romanian novel. A wide and socially diverse readership and a myriad of industrious authors were the common traits of the emerging literary market in the Romanian Principalities, Moldavia and Wallachia, that united, after the Treaty of Paris, in 1859 (Patraș, 2019; Patraș et al., 2020; Patraș, 2021; Galleron et al., 2021; Pascariu et al., 2022; Olteanu, 2023). Though under Austro-Hungarian rule, Transylvania was also a region with a Romanian majority, sharing the same language and cultural tradition with both Principalities. The three areas were in proximity and under the wider influence of the Balkan area; therefore, they were affected by the hajduk activity and by the folklore and national myths engendered by these outlaws. Situated between the Danube and the Black Sea and formerly part of the Ottoman Empire, Dobruja was equally a land of hajduk activity. This region was also assembled to the United Principalities of Romania in 1878, after the Russo-Turkish War. In a nutshell, the actual geographical area of Romania circumscribes the hajduk life as a way of living or as a form of life, in the acceptation given to this concept by Giorgio Agamben (2000): “a life that can never be separated from its form”, “a life for which what is at stake in its way of living is living itself”, “a life of power”, “a political life, that is, a life directed toward the idea of happiness” that is thinkable only starting with “the irrevocable exodus from any sovereignty” (pp. 3–15).

The idea of creating an interdisciplinary database and its rationale for a multitude of research endeavours draw from the discovery that, considered as a form of life, hajduk criminality (in Romanian “haiducia”)—usually referring to grand route hijacking or to robbery in blind spots such as woods, pubs, inns, monasteries, country mansions (Gârleanu, 1969, pp. 7–50)—requires a complex approach. Researched in the broad Balkan area and in Romania as a zoom-in study, the historical, economic, social, cultural, imaginary, and even stylistic facets of “haiducia” need a rich documentary framework or a “cadrage documentaire” (Caïra, 2011, p. 136) containing information that might be developed into database attributes. For instance, signposts of factuality (Lavocat, 2020) and degrees/intensities of factuality (Odochiciuc et al., 2025; Atanasiu et al., 2025) can be operationalized and standardized in order to function as attribute classes. Our claim on broadening the docu-frame of literary objects is based on the fact that, as already shown by scholars, the reader is generally ready to sacrifice “the narrative tension, inherent in the introduction of plots in cinematographic or novelistic works of fiction, in order to taste the fruits of factuality” (Lavocat, 2020, p. 589). Marked by “the pleasure of recognition” and by a concrete or substantial memory, factuality thus plays a crucial role in triggering both nineteenth-century readers’ responses to the new literary offer and the expansion of the literary market itself.

Initially, data were gathered with a view to providing historical context, that is, scaffolding a documentary framework (docu-frame) for the toponymic Named Entity Recognition (NER) of Hai-Ro, a literary corpus of 47 hajduk novels published during the long nineteenth century (1850–1920), and counting approximately 1.7 million words (Patraș et al., 2020). Contextualizing the geographical data of Romanian hajduk fiction represented one of the desiderata of the Hai-Ro Project (PN-III-P3-3.1-PM-RO-FR-2019-0063), a bilateral research cooperation between Romania and France that envisaged the broad concept of literary spatiality (Galleron et al., 2021). The present study draws from previous findings on literary spatiality and leans on concepts, approaches and practices belonging to the fields of literary geography, spatial humanities, and computational literary studies (Piatti et al., 2009, p. 178). At the crossroads of mapping, georeferencing, and literary studies, the concept of factuality is here illustrated exclusively through toponymical data. This baseline structure has been devised as a sample for a broader criminality database, connecting people, crimes, contexts, and places in the Balkan area. By criminality, we refer to crimes defined in juridical sources from the long nineteenth century, chiefly to varieties of theft, injury, abduction, rape, and murder. By Balkan area, we defined not only a strictly clipped geographical territory, but also regions such as Romanian Principalities, Transylvania, and Dobruja, that were influenced, via the Ottoman Empire’s administration, by the Balkan culture, trade, and societal types. Being a longue durée periodization, the long nineteenth century was more convenient for the broad scope of the present research endeavour. This time-frame also aligned with the periodization rationale of previous projects encompassing literary objects, chiefly popular novels and genre fiction, from the 1800^s to the end of the First World War.

In the first trials, the non-literary data extracted from various printed sources (historical, archaeological, juridical, etc.) were intended to anchor the quantitative literary analysis, designed as a substantive, thus factual, mapping of Romanian hajduk fiction, whose geographical distribution comprises 519 datapoints.

(2) Dataset Description

Repository Location

Zenodo

https://doi.org/10.5281/zenodo.18712335

Object Name

DATABASE ON HAJDUK CRIMINALITY

Format Names and Versions

CSV, GeoJSON, txt files

Creation Dates

From 2025-03-01 to 2025-09-29

Dataset Creators

Roxana Patraș, Conceptualisation, Data Curation, Alexandru Ioan Cuza University, Iași

Constantin Răchită, Data Curation, Validation, Alexandru Ioan Cuza University, Iași

Mihai-Bogdan Atanasiu, Data Curation, Conceptualisation, Alexandru Ioan Cuza University, Iași

Ana Odochiciuc, Software, Validation, Alexandru Ioan Cuza University, Iași

Language

English, Romanian

License

The Creative Commons Attribution 4.0 International

Publication Date

2026-02-20

(3) Method

(3.1) Methodological Concerns

Our research underwent a digitization phase involving scanning and OCR, which, from the viewpoint of envisaged database architecture, is trivial and thus will not be dwelt upon. The docu-frame of the hajduks’ form of life (eighteenth to early twentieth century) is based on a repository of 17 CSV files (also available in GeoJSON format) that resulted from the AI-assisted, semi-automated, or manual extraction of textual and map toponyms. This step was followed by disambiguation (for instance, in cases of geolocations sharing the same toponym) and by the identification, with the assistance of OSM, of coordinates for each entry. Coordinates were geocoded using OpenStreetMap data via the Nominatim service (© OpenStreetMap contributors), licensed under ODbL v1.0.

We opted for the software QGIS, version 3.42 (https://www.qgis.org) (last accessed: 01 March 2025), a well-reviewed and user-friendly visualisation and analysis georeferencing tool, that supports various layer formats like “vector”, “raster”, “GeoJSON”, etc., among which the comma-separated value (CSV) was, with a view to broadening the database usage beyond spatial humanities, the best option. The template supported by QGIS should contain at least three basic pieces of information (Toponym/Name, Longitude, and Latitude) corresponding to each point’s coordinates (Field X and Field Y), which we duly filled in. Besides an ID column, we added diverse information about the printed source and the additional resources used in the disambiguation process. A functional column containing snippets from primary sources was necessary in the first stage of toponym extraction and checking. Then, for copyright reasons, the column “observations” was removed.

The selection of primary sources (see the list provided in Digital Humanities Laboratory, 2026b) was made with respect to imaginary, historical, social, and cultural patterns of “haiducia” as a form of life. Accordingly, the spatial criteria and toponyms thereof conjectured correspond to several research questions:

Where do the hajduks hide? (woods/forests, inns)
Where do the hajduks attack? (monasteries, inns, pubs, post stations, grand routes)
Where do the hajduks hide the stolen objects? (money hoards).

Easily recognizable as forming a distinct form of life, the aforementioned patterns might be regarded as “proxies” of the hajduk phenomenon that was largely spread and quite typical for the Balkan area: criminality (crimes filed at the Principalities’ Courts of Justice or reported); nomadism and high mobility (mapped post stations and inns along the travelling routes); hiding (documented woods/forests, caves); celebration of spoils (pubs and inns); hoarding valuable goods (money, guns, clothes); fear and uncertainty (mapped urban peripheries, peri-urban areas, isolated domains). These “proxies” have been determined and then quantified by applying a toponymical reduction of space (Pricop, 2024), which, in the light of geo-criticism (Westphal, 2013), might look tedious; however, the potential complexity of the 7 resulted toponymical classes gets to the fore when they are reintegrated in the literary domain and discussed within the spheres of concepts such as “literary factuality” (Pettersson, 2020) or “poetic justice” (Nussbaum, 1995, pp. 79–123).

(3.2) Data Sourcing and Provenance

There are two categories of sources used for toponym extraction: texts and maps, the majority in printed form, which implied pre-processing tasks such as scanning, OCR, and manual clean-up of approx. 20,000 pages. The only digitized resources were the Hai-Ro Corpus, secondary data derived from it in the form of a list of georeferenced toponyms with the DARIAH geo-browser service (https://de.dariah.eu/en/geobrowser) (last accessed: 23 June 2020), and data provided by OSM-OpenStreetMap (https://www.openstreetmap.org) (last accessed: 29 September 2025). The maps have been consulted and copied at BNF, but their use as a QGIS layer required adjustments in Photoshop and mosaic-piecing: the collation of 20 map pieces and the over-colouring of roads, borders, and suchlike with a QGIS tool (Geometry Type/Line String). The textual sources include historical/archaeological studies, toponym dictionaries, geographic dictionaries, and scholarly contributions on Romanian toponymy. The visual sources include old Russian, Austrian, and Romanian maps commissioned in the first half of the nineteenth century, as well as new maps generated with GIS software (e.g. monetary findings in the region of the Principality of Moldavia). As already mentioned, the problems of difficult transliteration, faulty map localization, toponymic fluctuation and change, and various ambiguities have been addressed by adding supplementary bibliography (for instance, on gold extraction sites). To visualize the spatial distribution of our datapoints, the output of the first extraction was structured into seven toponym classes. The table below (see Table 1) specifies, for each file, the toponym class, referential frame, primary source, preprocessing tasks, source type, number of datapoints, and types of extraction that were applied.

Table 1

Data model.

FILE NAME	TOPONYM CLASS	REFERENTIAL FRAME	PRIMARY SOURCE ABBREVIATION	PREPROCESSING TASKS	SOURCE TYPE	DATAPOINTS NUMBER	EXTRACTION TYPE
FINAL_romane haiducesti Romania.csv	General Toponyms	literary	HAI-RO		digital data	519	Automatic with query & filtering
FINAL_localitati department criminalicesc Moldova.csv	Crime Toponyms	factual	DCM	scanning; OCR	text	309	Manual
FINAL_calatori straini Romania.csv	Crime Toponyms	factual	Călători_străini	scanning; OCR	text	103	AI-assisted & filtering
FINAL_calatori straini Romania_rivers_disambiguation_gold sites.csv	Crime Toponyms	factual	Extragere_aur	scanning; OCR	text	71	AI-assisted & filtering
FINAL_localitati dictionar geografic Romania.csv	Dwelling Toponyms	factual	MDGR	scanning; OCR	text	26	Manual
FINAL_tezaure monetare austriece harta Jantovanu.csv; FINAL_tezaure monetare harta Jantovanu.csv; FINAL_tezaure monetare otomane harta Jantovan.csv; FINAL_tezaure monetare rusesti harta Jantovanu.csv; FINAL_tezaure monetare Sadagura harta Jantovanu.csv	Money-Hoarding Toponyms		Moneda_Moldova	scanning	map	72; 103; 122; 79; 24	Manual
FINAL_hanuri harta 1820 Romania.csv	Inn Toponyms	factual		mosaic-piecing, digital reconstruction	map	18	Manual
FINAL_poste harta 1820.csv	Post Stations Toponyms	factual		mosaic-piecing, digital reconstruction	map	196	Manual
FINAL_paduri_moldova.csv	Wood/Forest Toponyms	factual	TTM	scanning, OCR	text	909	Manual
FINAL_paduri_muntenia.csv	Wood/Forest Toponyms	factual	DTRM	scanning, OCR	text	2,968	Manual
FINAL_paduri_oltenia.csv	Wood/Forest Toponyms	factual	DTRO	scanning, OCR	text	871	Manual
FINAL_paduri_transilvania.csv	Wood/Forest Toponyms	factual	TTRT	scanning, OCR	text	477	Manual
FINAL_romania main cities.csv	City Toponyms	factual	OSM		digital data	63	Automatic with query

A supplementary file containing the 63 main cities of Romania (FINAL_romania main cities.csv) was created in order to connect and overlap the two map layers in QGIS: the 1820 Russian map and, respectively, the satellite OSM standard map available through a software plugin. Both the CSV files and GeoJSON files were named to easily track their primary sources and respective acronyms/abbreviations (sigles). Some of these primary data files also served for creating derived data files, such as collated datapoints for “hajduks stays”, “hajduk memory”, “hajduk hideouts”, etc., that have been used for various experimental visualizations. Derived data have not been uploaded to our repository as they resulted from simple operations like appending and removing duplicates. The primary data are available in DATABASE ON HAJDUK CRIMINALITY (Version 2) at https://doi.org/10.5281/zenodo.18712335.

(4) Processing and Structuring

(4.1) Extraction

The extraction of toponyms was carried out manually, semi-automatically, or with the assistance of AI. Each type of primary source required customization based on the specifics of nonstructured data therein.

In the case of the Hai-Ro corpus, the toponyms were selected from a larger list of words starting with capital letters, extracted with a simple regex. This resource has not been subjected to further refinement because, in order to function as a contrast material in a factuality framework, the literary imagination that favoured the archetype of the noble rebel (the national hajduk) and the ideal of social justice (equal chances and access to resources) needs to be kept as ambiguous or as fictional as possible.

For the majority of primary sources that underwent prior scanning and OCR, the queried key terms have been determined by considering the level of structuring of the original materials (dictionary entries, indexes, abbreviations). For instance, the hajduk stays have been tracked by searching keywords such as “cârciumă” [pub], “crâșmă” [pub] and “han” [inn], but also other old and regional forms like “crâcimă” (also taking into account possible spelling errors caused by OCR recognition), as well as frequently collocated verbs like “au mas” [to stay], occurring in the documents of the Criminal Department of the Principality of Moldavia. The geographical and historical sources were queried using keywords denoting social status, such as “haiduc” [hajduk], “tâlhar” [robber, thief], and related synonyms.

Data on woods/forests from the four Romanian regions (Moldavia, Wallachia, Oltenia, and Transylvania) have been extracted with careful consideration of the various methodologies and localizations formerly employed by toponymists engaged in the research of the aforementioned historical regions. As research on Transylvanian and Dobrujan toponymy is still ongoing, we expect that more data will be added before long. In each of the four regions mentioned above, we searched for key-terms such as “pădur”, “codr”, “dendron”, “fiton” or “păd”, we identified the dictionary/thesaurus entry that contained them, and then we extracted data on surroundings (village, commune or county names) that enabled as precise georeferencing as possible.

The criminality places reported by the foreign travellers in the Romanian Principalities have been determined and checked as follows:

fed NotebookLM (Gemini 1.5 Pro version/accessed in May 2025) with 12 pdf-s of primary sources (the series Călători_străini);
prompted in a few shots the same request for all volumes in the series;
evaluated results provided by several NotebookLM notes and data tables;
manually checked information in the notes and data tables by retrieving larger contexts from pdf-s;
generated a list of toponyms.

The prompt used to search the material with the AI tool comprised a set of typo variants, synonyms, and keywords serving as proxies for the hajduks’ form of life (see prompts_NotebookLM.txt in Digital Humanities Laboratory, 2026a). Toponymical data extracted with AI tools represent 2.51% from the total of 6930 final places. Because rivers need a different data support (line vector and not point, currently adjoined to toponyms), river names mentioned by travellers in contexts of robbery have been “punctured” by searching for gold extraction sites along their courses.

The post stations, inns, and other types of data related to the grand routes, waters, or relief mapped by the Russians in 1820 have been transliterated from Cyrillic into the Latin alphabet. The money hoardings sites have been extracted from maps provided in Elena Arcuș-Jantovan’s contribution (2022), referring to historical regions such as Bessarabia (currently Republic of Moldova, located on the left bank of the Prut River), the Northern part of Bukovina (currently in Ukraine) and the region of Moldavia (currently Romania, located on right-bank of the Prut River).

The toponyms extracted from textual and map sources have been inserted in CSV format, with data standardised in a minimum of seven columns: A. ID; B. Name – toponyms from primary sources; C. Address – current name of an address located as close as possible to the toponym mentioned in column B (cross-checked with OSM); D. Sigle in LIST of PRIMARY SOURCES – contains a general acronym for a cluster/series of primary sources; E. Primary Source – short description of data; F. Longitude – decimal degrees, OSM; G. Latitude – decimal degrees, OSM.

Figure 1 shows the workflow that integrates all transformations from primary sources to CSV and GeoJSON files.

Flowchart: from primary sources to QGIS integration.

(4.2) Challenges and Limitations

The main challenge we encountered was disambiguating toponyms in primary sources. Briefly, in some cases, two or more geolocations share the same toponym, while in others a single geolocation has borne several successive names, each with its own spelling and alphabet. In both cases, we used contextual analysis, which involved manual checking of all data.

The processing of data contained by visual primary sources faced both linguistic and localization challenges because old maps of Russian or Austrian production do not have a homogeneous system of recording toponyms: for instance, on the Russian 1820 map we used, places in Transylvania that had Hungarian, Austrian, and Romanian names were inscribed, by using the Cyrillic alphabet, with only one of them. The cartographers’ preference for one of the three names lacks a clear rationale and requires methodological input from political history.

In Table 2, we provide a short list of transliteration challenges (see readme_DATABASE ON HAJDUK CRIMINALITY.txt in Digital Humanities Laboratory, 2026a):

Table 2

Transliteration challenges.

CHALLENGE	ORIGINAL FORM (CYRILLIC)	TRANSLITERATION(LATIN)	CURRENTLY ROMANIAN TOPONYM
Imprecise rendering in Cyrillic	Потлилои	Potliloi	Podu Iloaiei
Rendering in Cyrillic of Hungarian names	Марошъ Вашаргели	Maroș Vașargheli	Târgu-Mureș
Rendering in Cyrillic of German names	Елизабетштадъ	Elizabetștad	Dumbrăveni
Incompatibility between Cyrillic/Russian naming and current toponym	Пуцени	Puţeni	Valea Mărului
	ФелВинцъ	FelVinţ (in Hungarian Felvinc)	Unirea (named “Vinţu de Sus” until 1970)
	Пискельдъ	Pischeld (in Hungarian Piskolt; in German Pischkolt)	Pişcolţ
Places named with an initial capital letter	K.	Kiskapus	Copșa Mică
Places named with an initial capital letter	K.	Unidentified place	On the road between Târgu-Mureș and Sighișoara
Unidentified places	Иллюза	Illiuza	On the road to Bistrița, on the river Tiha Bârgăului
Changed roads and routes			The old route = Bucureşti – Copăceni – Pietrele – Giurgiu (~ 84 km on the Russian Map of 1820); Current route = Bucureşti – Călugăreni – Uzunu – Daia – Giurgiu (64 km on OSM)

The disambiguation of monetary-hoard data was carried out by carefully assessing place names associated with different geolocation coordinates. For instance, on the current territory of the Republic of Moldova, there are at least 3 villages named “Seliște” (in the counties Leova, Nisporeni, and Orhei). Similarly, several sites on both the left and right banks of the River Prut share the same toponym and were precisely determined following the evaluation of neighbouring areas.

The main disambiguation issue in wood/forest georeferencing stems from the fact that dendronyms are often identical to the toponyms of large estates or counties where these geographical units were located. Another challenge arises from the relatively extensive areas of some woods/forests, which create the risk of multiple coordinate sets for the same geographic referent. Therefore, we decided to differentiate dendronyms from other toponyms by conventionally adding “Pădurea” before the wood’s name. This addition did not fix the issue of shared dendronyms for different geolocations. After an alphabetic ordering, woods with the same coordinates have been removed (as duplicates), while the shared dendronyms were added an extra number in the file name (e.g. “Pădurea Dumbrava_1”, “Pădurea Dumbrava_2”, “Pădurea Dumbrava_3”, etc.). In cases of name variation, we selected the form that is the closest to the site toponym: for instance, the entry “Negreștii” in TTM contains several overlapping dendronyms “Pădurea Negrești”, “Pădurea Negreşti-Valea Mare”, “Pădurea Negreştilor”, and “Trupul Negreşti”, among which we chose “Pădurea Negrești” as the closest to the current corresponding address. In other cases, we kept the entire row of denominators: for example, the occurrences of phrases containing the noun “Codrul”/“Codrii” [forest/forests] have been preserved as such, even if this meant a relative doubling of dendronyms (e.g. “Codrul Țibăneștilor” [The Forest of Țibănești] – “Pădurea Țibăneştilor” [The Wood of Țibănești]). With contrastive data on forest raster provided by old maps and current OSM, we envisage documenting deforestation in Romanian historical regions as an extension of our repository for a criminality database.

For documents that underwent AI processing, the challenge was not disambiguation, but the risk of losing relevant data through automated search, summarization, and coding. Therefore, we decided to add supplementary historical resources in order to enable checking of data extracted by the LLM:

attestations about gold extraction sites on rivers;
various general terminology used to connote the hajduk form of life or hajduks as persons/status (“haiduc”, “hânsar”, “gheaiduc”, “haramin”, “hoț”, “tâlhar”, “bandit”, “tâlhărie”).

For instance, the old noun “gheaiduc” [hajduk] is part of a Romanian mountain toponym (Muntele Ghelarilor), while the noun “hoț” [thief] occurs in contexts of theft supply rather than as hajduk criminality. Romanian nouns “tâlhar” [robber, thief, bandit] and “tâlhărie” [robbery] as equivalents for hajduk and hajduk crime are rather scarce in our primary sources.

Our approach to data extraction has limitations that need to be addressed in the future. The first is the relative imprecision of primary resources, chiefly maps. The second derives from the lack of homogeneity in the methodologies, descriptions of geographical neighbourhoods, and relative localizations used by Romanian toponymists. The third limit is in the eye of the beholder, which means that coordinates tagged on each list of toponyms might not be exact because of the site’s large size (Bucharest is now larger than the nineteenth century city) or because of the site’s physiographic particularities (rivers, roads, mountains) that require different types of geometries (line and polygon rather than point).

(4.3) Quality Control and Filtering

After duplicates were removed and toponym spelling errors corrected, the CSV data underwent quality control and filtering through visual inspection, consisting in the successive addition of layers in QGIS and the identification of anomalies. Visualization proved to be a very effective approach for checking positional accuracy, as files could be fixed and saved while various types of analysis were underway (e.g. buffer). In this stage, which combines conceptualization on the one hand with data collection, structuring, and processing on the other, we have paid attention to over-filtering tasks that might congest the development of this database. Conversely, we envisage enriching data sources for other fields such as archaeology, ethics, psychology, politics, and the arts. At the same time, we carefully documented the lineage of our data by creating a list of primary resources “LIST of PRIMARY SOURCES. NONSTRUCTURED to STRUCTURED DATA for a DATABASE ON HAJDUK CRIMINALITY”, which is available under https://doi.org/10.5281/zenodo.18798966. Abbreviations (sigles) were added in this bibliographic list as a means of organizing primary sources and of effectively referencing them in the CSV files.

(5) Analysis and Discussion

The creation of maps in QGIS was the primary method for visualizing toponymic data extracted from the Hai-Ro literary corpus. Rather than viewing the landscape as a flat background, we utilized vectorial data and a layered, multidimensional approach to reconstruct the complexity of the hajduks’ world and form of life. All layers in QGIS were projected in WGS84/EPSG4326. The first experiments in layering secondary data (see chapter on data sourcing) over a historical nineteenth century raster map (1820) encouraged us to bridge the gap between fictional space and geographic reality by adding more georeferenced data from various domains. Technically, QGIS treats this vector data as a table with a geometry column (in our case, geometry = point), which allows us to store attribute data (such as “village” vs. “city”, country names, county names, etc.) alongside geographic coordinates. This implies that our list of attributes—which now resumes only to “name/toponym”—might be expanded to forms of relief (mountain, river, passage, etc.), forms of administration and property (Ottoman “raya”, county, boyar estate, monastery estate, etc.), demographic data, and so forth. Consequently, each point on our map is not merely a visual symbol but a comprehensive data entry containing potentially complex information. Complementing these datapoints, line layers (important roads) drawn on old maps helped us to show the dynamic movement of the hajduks (see Figure 2).

Visual Representation of Dataset as Overlapped Layers (1820 Russian Map as Base Layer).

This map illustrates the distribution of the factual data across the Romanian regions mentioned above, along with the literary data marked in blue. By using categorized symbology, we ensured that various features—from post offices, monetary finds to forests—are clearly distinguishable. Because QGIS supports overlapping layers, we can see roads not just as isolated lines but as paths that intersect with historical administrative centres or natural landscapes. This composite view is essential for bridging the fictionalized spaces with real historical maps.

The difference between a static and a dynamic approach to spatiality was captured by using the buffering method. A buffer creates a zone of a specified distance around a geographic feature. Applied to literary analysis, buffering allowed us to calculate, with some referencing errors that will make the research object of a separate paper, “the spheres of influence” of both the law (the administration of cities and towns) and the outlaw (the peripheries). For instance, by applying a 50 km buffer (from 25 to 25 km) around cities/military pickets, we were able to visualize the reach of state authority (see Figure 3). The results of this analysis reveal a “spatial tension” between the two forms of life: the military-administrative type and, respectively, the paramilitary-hajduk type. The occurrences increase at the very edge of these buffers: this is where the “official” space of the state ends, and the “mythical” space of the hajduk begins.

Buffer Analyses of Criminality Hotspots in the Romanian Principalities.

(6) Reuse Potential

Even if initially conceived to serve as a docu-frame or as a “quantitative rendering” for the concept of “literary factuality” in the nineteenth-century Romanian genre literature, our dataset proved to be remarkably malleable with respect to spatial analysis applied to other textual sources, chiefly non-fiction such as memoirs, letters, travelogues, essays, etc. belonging to the same historical period. Similarly, its quality recommends it for use as a testing body in the preliminary stages of other humanist GIS projects. In what follows, we briefly survey the reuse scenarios that could be built upon the current structure of our repository. These scenarios are expected to become actionable (see Quickstart_Guide_QGIS.txt in Digital Humanities Laboratory, 2026a) through future adjustments to research questions that will set optimal analysis parameters and select appropriate data attributes for either GIS or non-GIS software. By overlaying different geometries, such as polygon and point, toponymists may achieve a more precise positioning of their place names. Historians are likely to find that, in a multi-layered visualization, archaeological, geographical, and map data provide the basis for new hypotheses and approaches to general phenomena such as criminality, circulation, urbanisation, deforestation, demographic development, etc. Geographers might also reconsider the utility of diachronic toponymy as a resource for checking current positional accuracy. All in all, the accessibility of our data, its interoperable formatting, as well as its understated interdisciplinarity make it a candidate resource for cascading projects related to the Balkan area.

Acknowledgements

The authors would like to thank the editors for their prompt response upon submission and reviewers for the constructive feedback that improved the manuscript of this paper.

Competing Interests

The authors have no competing interests to declare.

Author Contributions

Roxana Patraș: Conceptualisation; Methodology; Formal analysis; Investigation; Software; Validation; Writing – original draft; Writing – review & editing; Project administration; Resources; Supervision.

Constantin Răchită: Data curation; Formal analysis; Writing – original draft; Resources; Software; Validation; Writing – review & editing.

Mihai-Bogdan Atanasiu: Conceptualisation; Methodology; Data curation; Formal analysis; Writing – original draft; Resources; Validation; Writing – review & editing.

Ana Odochiciuc: Formal analysis; Investigation; Resources; Software; Validation; Writing – review & editing.