Have a personal or library account? Click to login
Editorial: Representing the Ancient World through Data Cover

Editorial: Representing the Ancient World through Data

Open Access
|Dec 2024

Full Article

(1) Context and motivation

In recent years, the humanities have increasingly embraced data-driven methodologies, with a growing interest in quantitative studies and the use of digital tools (Owens 2011; Mayer-Schönberger & Cukier 2013; Schöch 2013; Graban et al. 2019; Harrower et al. 2020). This shift has greatly benefited fields like Ancient World studies, where the creation and use of datasets have become essential. Whether scholars conceptualise their materials as datasets or not, they have long relied on well-curated data, from textual corpora and field reports to collections of inscriptions and museum catalogues. The well-known principle applies here like in any other field: the quality of the input limits the quality of the output. In a world of technological connections, it has become crucial to ensure not just that research is of the highest standard, but also that the underlying data is shared to enable reuse and reproducibility to the widest extent.

Nevertheless, venues that prioritise the publication of Ancient World data are still scarce, thus leading to a widespread lack of recognition for work such as corpus design or data curation. The landscape of Ancient World studies does not, in this, differ significantly from other disciplines: in scientific fields, the proper avenues for publishing and crediting datasets have been the subject of discussion for a while (Pierce et al. 2019). In response to this, the data paper has emerged as a valuable tool for addressing this gap in the Humanities (McGillivray et al. 2022; Wigdorowitz et al. 2024), and, therefore, in Ancient World studies: rather than presenting new interpretations or findings, its primary focus is on describing a dataset in detail. This includes the methods of its collection, its structure, and its reuse potential. By publishing data papers, scholars ensure that their work is accessible, reusable, and formally recognised within the academic community. Such publications provide crucial visibility for the labour-intensive process of compiling datasets, which may otherwise go unnoticed in conventional research publications. As a result, the development of venues like this special collection is essential in fostering the recognition of data-driven contributions and encouraging researchers to share their data openly and transparently. Despite these advances, the specific characteristics of Ancient World data, such as their fragmentary and often incomplete nature and the lack of interpretative frameworks accompanying the data themselves, create a unique set of challenges. In this context, the systemic under-valuing of data-centred contributions leads to missed opportunities not just within the field (in that the creation of new datasets of the highest possible quality is discouraged), but also beyond, as work on Ancient World data can offer methodological contributions that are relevant to other similarly fragmentary categories of data.

Our special collection Representing the Ancient World through Data aimed to address this gap by inviting the submission of data-centred papers. We aimed to broaden the scope of our selection as much as possible: our definition of the Ancient World was expansive in its chronological, linguistic, and cultural reach, and we were keen on including not just textual, literary, and linguistic corpora, but also geographical, historical, papyrological, archaeological collections, as well as computational tools and resources. Although the collection was designed to be open to studies across the Ancient World without geographical limits, the majority of the submissions received focused on the Ancient Mediterranean area. This outcome may reflect, in part, the professional network of the editors, but it may also signal that significant progress is still to be made in collecting data and producing data papers for other geographical areas of the Ancient World. Given the journal’s broader focus on the humanities, we were particularly interested in submissions that would highlight the deep interdisciplinarity that characterises the field by bringing it into contact with other humanities subjects of general interest. The focus of the special issue on short data papers, as well as JOHD’s open-access model (although it must be noted that the journal is unfortunately fee-paying, a model that is difficult to escape in the open-access landscape), together with other factors such as the high visibility of the journal on social media, all came together in our planning to ensure that datasets published in this special issue would be widely distributed and properly recognised.

(2) Description

This special collection presents a curated compilation of datasets spanning a diverse array of subjects (linguistics, literature, archaeology, geography, religious studies, history), ancient and modern languages (Greek, Latin, Akkadian, Sumerian, English, German, French, Portuguese), and geographical regions (mainly the Mediterranean and the Ancient Near East) of the Ancient World. The collection not only showcases the breadth of data-driven research but also underscores the immense potential that lies in the integration and accessibility of such datasets for interdisciplinary scholarship.

A significant portion of this collection is dedicated to linguistic datasets, many of which leverage computational tools to analyse ancient languages and texts. Dexter et al. contribute a large-scale intertextuality dataset derived from modern commentaries on Valerius Flaccus’ epic poem Argonautica. This dataset provides a robust foundation for studying intertextual relationships within Latin literature and demonstrates the utility of tools like Fīlum for intertextuality detection. Similarly, Ong presents a morphosyntactic annotation of the State Archives of Assyria online (SAAo) letter corpus, comprising approximately 2,600 letters from the Neo-Assyrian kings and including detailed annotations of parts of speech, lemmas, and syntactic dependencies. Ong and Gordin introduce a dataset of 2,400 Akkadian metaphors, specifically Body Part Constructions (BPCs), further enriching the linguistic resources available for Akkadian studies. Romanello and Najem-Meyer offer a multilingual named entity corpus from 19th-century commentaries on Sophocles’ Ajax. This corpus, annotated across English, German, French, Latin, and Greek, serves as a valuable resource for evaluating information extraction systems in multilingual and multiscript contexts. Meanwhile, Farina’s dataset of over 25 Greek and Latin words related to the semantic field of the sea provides morphosyntactic and semantic annotations that are instrumental for cross-linguistic semantic analyses and interdisciplinary research across literature, geography, and anthropology. Pratali Maffei’s compilation of Hellenistic epigrams from Doric-speaking regions presents a searchable and reproducible database that invites literary, linguistic, and historical analyses. Complementing these contributions, Mambrini and Passarotti’s dataset of 215,102 Latin dictionary forms, structured through RDF triples, enhances the interoperability of Latin language resources within the LiLa: Linking Latin project (Passarotti & Mambrini 2021). Palladino et al. introduce aligned translation datasets for Ancient Greek in English, Portuguese, and Latin, establishing a gold standard for translation alignment models and serving as high-quality training data for machine learning applications. Finally, Bru’s dataset on average word lengths in Classical and Post-classical Greek provides empirical evidence for diachronic linguistic changes, offering a valuable resource for historical linguists.

The collection also delves into the rich tapestry of Mesopotamian civilisation through extensive cuneiform datasets. Chen et al. present CuneiML, a meticulously curated dataset of Unicode transcriptions, transliterations, and metadata for a collection of Sumerian and Akkadian cuneiform tablets. This dataset is designed to support machine learning applications for classifying genres, provenance, and periods of cuneiform artefacts, thereby advancing digital cuneiform studies. Cobanoglu et al. contribute a dynamic corpus of transliterated cuneiform tablets from the Electronic Babylonian Library (eBL) platform, featuring a public API and a Python library for parsing transliterations. With approximately 25,000 tablets and over 350,000 lines of text, this corpus represents a substantial resource for both cuneiform scholars and computational linguists. Complementing these efforts, Clark and Gordin’s Mesopotamian Ancient Place-Names Almanac (MAPA) integrates textual sources with remote sensing data to reconstruct the social and physical geography of Uruk and its hinterland. This gazetteer, adhering to linked open data protocols, encompasses nearly 400 placenames from diverse imperial archives, facilitating nuanced geographical and historical analyses.

The integration of archaeological data with geospatial technologies is another cornerstone of this collection. Hagmann’s overview of the Roman Rural Landscapes in Noricum (RRLN) project highlights the use of the PHAIDRA system (University of Vienna 2008) for archiving archaeological data in compliance with FAIR principles. By combining open geodata with unstructured datasets within a Geographic Information System (GIS) framework, the project enhances our understanding of Roman rural landscapes in the less-explored regions of Noricum. Hunziker and Graml’s research-based teaching project introduces a data structure concept for analysing ancient sanctuaries, incorporating landscape settings, cult practices, and worshipped deities. Their preliminary GIS-based graph database exemplifies how large and diverse datasets can deepen our comprehension of Greek religious practices. Additionally, Laguna-Palma underscores the importance of open data practices in archaeo-historical research through the PERAIA project, which offers a comprehensive gazetteer of archaeological and heritage sites in the Eastern Mediterranean. By integrating legacy data with aerial and satellite imagery, this platform not only enhances data accessibility but also facilitates the discovery of previously unknown sites.

The collection also features significant contributions to textual and religious studies. Bilby and BeDuhn’s reconstruction of Marcion’s Evangelion provides an updated Greek version of the corresponding English reconstruction, available in multiple formats to accommodate various research needs. On the other hand, Bilby et al.’s reconstructions of Marcion’s Apostolos represent the first attempt towards a digitalisation of this text, lemmatised and morphologically tagged. All reconstructions adhere to normalisation standards, ensuring compatibility with other scholarly efforts and facilitating both human and machine-readable analyses.

Finally, Dosi’s DataCons Project offers an open-access dataset of late Roman consular dating formulae from 284 to 541 CE, aggregating over 4,800 documents across ten regions and three scripts. This dataset, currently focusing on Latin and Greek documentation, supports interdisciplinary research on consular materials and political history. The ongoing expansion of this dataset, coupled with the forthcoming online relational database, promises to be a useful resource for historians and political scientists.

(3) Discussion

At the moment of writing this editorial paper, Representing the Ancient World through Data is JOHD’s most successful special collection with 18 published articles, 17,100 views, and 1,824 downloads. The majority of the published articles (16) are data papers, while two are discussion papers. The success of this special collection has led us to examine the role of data-driven papers in traditionally less data-centric fields, such as Ancient World studies.

There is certainly a significant need among such disciplines for platforms where researchers can describe their work and showcase new datasets. Such need has perhaps never been fully addressed by offering a dedicated venue which values the importance and the efforts related to the creation of datasets related to Ancient World studies. When we launched the call for papers for our special collection, several potential authors (particularly scholars who do not work with computational methods and whose research is more qualitative-focussed) expressed uncertainty about whether their material was suitable for a data paper. Some of them were unfamiliar with the concept of a dataset or unaware that their research actually involved one. Nonetheless, they ended up submitting valuable contributions and some of their datasets present complex structures with multiple levels of information, promising a high potential for reuse. This convinced us that researchers in our field are increasingly recognising the value of their data and are willing to contribute to the data-sharing ecosystem. Moreover, the growing engagement with data papers indicates a broader awareness of how sharing detailed datasets can enhance the scholarly impact and usability of their research, fostering a more inclusive approach to data in Ancient World studies.

We can draw two main conclusions from our work on this special collection, concerning both the creation of datasets and their reuse potential. Dataset creation is nowadays an almost essential component of the research process also in disciplines related to the Ancient World. Some of these datasets may reach high levels of complexity (e.g. Mambrini and Passarotti for linguistic resources; Hunziker and Graml for archaeology and religious studies). Scholars of the Ancient World need dedicated venues to discuss their data collection and curation processes, significantly different from the STEM (Science, Technology, Engineering, Mathematics) disciplines due to the nature of the data. In some cases, data on the Ancient World may be extremely fragmented and scattered across various sources that are often difficult to access (e.g. Dosi’s DataCons Project). Moreover, fields that have long relied on traditional research methods can leverage digital and computational tools to produce new datasets and data-informed research outcomes. An example is given in Chen et al. and Cobanoglu et al., which illustrate different tasks of automatic classification applied to cuneiform tablets. The second strength of a dataset lies in its reuse potential. Different disciplines within the field of Ancient World studies produce data that can be reused by researchers from other sub-disciplines, even though they may specialise in different areas. For instance, linked data databases such as Pelagios (Simon et al. 2012), originally designed for geographical and historical research, can also support literary or historiographical studies. Similarly, lemmatisations contained in projects such as the Perseus Digital Library (Crane 1987; Crane et al. 2006) and the Diorisis Ancient Greek Corpus (Vatri and McGillivray 2018) may be essential for computational linguistics research on distributional semantics. This not only allows the creators of the dataset to gain visibility and recognition for their work but also enables the scientific community to advance its understanding of the Ancient World by utilising tools and data generated from research questions that may be extremely specific but have very diverse applications. The purpose of a special collection within JOHD is precisely to spotlight micro-areas of the humanities, such as Ancient World data, in order to support and address specific challenges of data collection, publication, and reuse in those domains. For these reasons, we encouraged our authors to describe in detail the reuse potential of their dataset in the dedicated section, underscoring its potential for applications beyond their specific domain (e.g. Farina).

When researchers in disciplines traditionally relying on qualitative and non-computational methods, such as Ancient World studies, realise that their data can be described and published in a way similar to practices in the STEM fields, they begin to uncover the benefits of publishing data papers. A data paper does not work alone but is rather part of a virtuous cycle where the dataset it describes, any associated codes, and the published research results derived from the dataset are all referenced to one another (McGillivray et al. 2022). In this way, the dataset description in the open repository will reference the data paper illustrating the dataset and the creation process. Vice versa the data paper will include a reference to the dataset hosted in the open repository. Finally, the research article will contain references to the deposited dataset and the data paper. This interlinked visibility benefits authors through increased citations and ensures broader dissemination of the dataset, enhancing the chances of reuse and contributing to the advancement of research in the field.

Initiatives such as our special collection, specifically targeting scholars working on the Ancient World, not only enhance the visibility of datasets and researchers working with data across various sub-disciplines, such as (computational) linguistics, geography, history, archaeology, but also elevate awareness of the value of data sharing and reuse in these disciplines.

(4) Conclusions

This special collection represents a major effort in the field of Ancient World studies to promote the visibility and open sharing of research data. The community’s interest in the collection, reflected both in the number of published papers and in its readership (with over 17,000 views across all articles at the time of writing this paper) indicates a receptive environment for data sharing and reusability, traditionally better established in the past by STEM disciplines. This evolution in the field is evident not only in the growing practice of sharing datasets but also in the increasing publication of data papers, exemplified by the success of the collection itself. While sharing datasets ensures that raw data is accessible for reuse, publishing data papers offers scholars the opportunity to provide essential context, describe methodologies, and gain formal recognition for the often complex task of compiling fragmented and dispersed information. By making these datasets both accessible and citable, researchers contribute significantly to the broader scientific community.

We hope that this special collection will encourage our readers to view their work from a fresh perspective, appreciating not only the value of data in their research but also the significant effort involved in compiling and curating the datasets that underpin traditional research articles. By highlighting the importance of data-driven approaches, we aim to encourage scholars to embrace open data practices, fostering transparency and collaboration. We envision further initiatives that continue to bridge the gap between traditional disciplines and digital and computational methodologies, fostering greater innovation and cross-disciplinary collaboration in the study of the Ancient World.

Competing Interests

The authors are all involved with the Journal of Open Humanities Data beyond their role as Special Collection editors. Andrea Farina is Associate Editor and Social Media Editor, Paola Marongiu is Lead Social Media Editor, and Mar A Rodda is a Copyeditor.

Author Contributions

Andrea Farina: Conceptualisation, Writing – original draft, Writing – review & editing.

Paola Marongiu: Writing – original draft, Writing – review & editing.

Mar A Rodda: Writing – original draft, Writing – review & editing.

DOI: https://doi.org/10.5334/johd.245 | Journal eISSN: 2059-481X
Language: English
Submitted on: Sep 19, 2024
|
Accepted on: Sep 19, 2024
|
Published on: Dec 17, 2024
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2024 Andrea Farina, Paola Marongiu, Mar A. Rodda, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.