Lost at Sea: A Dataset of 25+ SEA Words Morpho-Semantically Annotated in Ancient Greek and Latin

Andrea Farina

doi:10.5334/johd.139

Full Article

(1) Overview

Repository location

https://doi.org/10.18742/23968773

Context

In linguistics, the semantic field SEA has been studied in different ways, from the semantics of verbs of navigation (e.g. Maisak & Rakhilina, 2007; Divjak et al., 2010; Lander et al., 2012 in linguistic typology; Farina, 2021 on Ancient Greek) to metaphors connected to the sea (e.g. Leotta & De Felice, forth. 2023 on Latin). In Greek and Roman culture, the sea holds a prominent position, militarily (Harris, 2017; Nash, 2018), economically (Reed, 2003; Wilkinson, 2020; Boardman et al., 2021), and culturally (Berens, 1979; Lindenlauf, 2004; Nikoloska, 2012; Beaulieu, 2016).

This dataset contains linguistic information about more than 25 nouns, verbs, and adjectives connected to the semantic field SEA in four Ancient Greek and Latin texts between 5^th – 1^st century BCE (Lat. De Bello Gallico by Caesar, Aeneid 1–6 by Vergil; AGr. Histories 1–2 by Herodotus, Argonautica by Apollonius Rhodius).

The dataset has been created to support research on how the concept of SEA is lexicalized in Ancient Greek and Latin poetry and prose, with a case study on four authors.¹

(2) Method

In this section, I summarize the steps that I followed to obtain the dataset presented here.

Steps

Text retrieval: after choosing the texts (see Section 1 and below), I downloaded them in .txt format from Perseus 5.0 – also called Scaife Viewer – of the Perseus Digital Library (Crane 1987; Crane et al. 2006).²
Text annotation: I then uploaded the texts on the annotation platform INCEpTION (Klie et al., 2018, then Boullosa et al., 2018; de Castilho et al., 2018a; de Castilho et al., 2018b; Klie, 2018; Klie et al., 2020), developed by the Ubiquitous Knowledge Processing (UKP) Lab at TU Darmstadt. I created my annotation tagsets and layers, based on the linguistic parameters that were of interest for my work, i.e. morphology, lemma, passage, semantics, meaning (literal, metaphorical, metonymic), relations with proper nouns (see Section 3 for a more detailed description). At the end of my annotation, I exported the data in the UIMA CAS XMI (XML 1.0) format.³
Data extraction and dataset creation: I used a Python script specifically designed for the UIMA framework to extract the annotated data. I created a dictionary based on token IDs where I mapped the annotation layers. I then exported the dataset resulting from this extraction in CSV format.

Sampling strategy

For this dataset, I decided to focus on two literary genres, i.e. historiography (Lat. De bello Gallico by Caesar; Gr. Histories 1–2 by Herodotus) and epic poetry (Lat. Aeneid 1–6 by Vergil; Gr. Argonautica by Apollonius Rhodius). Given that I also wanted to investigate the distribution of SEA words in Ancient Greek and Latin, I selected parts of these texts depending on the final number of tokens. To maintain a balance between the Latin and Greek sub-corpora, some texts (Herodotus’s Histories and Vergil’s Aeneid) have not been fully annotated. Overall, my corpus has 174,501 tokens. The Greek sub-corpus constitutes 53% of the whole corpus, and it has 92,592 tokens (53,750 for prose and 38,842 for poetry). The Latin sub-corpus has 81,909 tokens (51,313 for prose and 30,596 for poetry).

(3) Dataset Description

The nouns, verbs, and adjectives included in this dataset are:

NOUNS: AGr. thálassa, póntos, pélagos, háls, Lat. mare, pontus, pelagus, aequor ‘sea’; AGr. húdōr, Lat. aqua, lympha ‘water’; AGr. háls, Lat. sal ‘sea’, ‘salt’; AGr. kûma, Lat. unda, fluctus ‘wave’; Lat. litus, ripa ‘shore’;⁴
VERBS: AGr. pléō (and its preverbed forms occurring in the analyzed texts), Lat. navigo ‘sail’;⁵
ADJECTIVES: AGr. thalássios, póntios, Lat. marinus, maritimus ‘maritime, marine’.⁶

In the CSV file, annotations are represented with ten columns and as many rows as the number of SEA tokens in each of the considered texts. Columns provide: (1) the token (TOKEN); (2) its morphological analysis (MORPHOLOGICAL FEATURES); (3) its lemma (LEMMA); (4) its part of speech (POS); (5) the sentence in which the token is found (PASSAGE); (6) the type of token meaning (literal, metaphorical, or metonymic), according to cognitive linguistics and the new WordNets for ancient Indo-European languages (Biagetti et al. 2021) (MEANING); (7) its meaning in context using synsets from the WordNets, preceded by a unique identifier (SYNSET); (8) the token ID (ID); (9) possible words (proper nouns or adjectives) in Ancient Greek or Latin to which a noun meaning ‘sea’ is referred (REFERS TO); (10) the meaning of the phrase resulting from (1) and (9), using synsets from the WordNets, preceded by their unique identifier (DENOTES). An excerpt of the dataset is given in Table 1.

Table 1

An excerpt of the dataset (13 rows of Apollonius Rhodius’s Argonautica).

TOKEN	MORPHOLOGICAL FEATURES	LEMMA	POS	PASSAGE	MEANING	SYNSET	ID	REFERS TO	DENOTES
ἁἁὸς	Case=Gen\|Gender=Fem\|Number=Sing	ἅλς	NOUN	ἔνθ᾽ ἄρα τοίγε ἑςπέριοι ἀνέμοιο παλιμπνοίῃςιν ἔκελςαν, καί μιν κυδαίνοντες ὑπὸ κνέϕας ἔντομα μήλων κεῖαν, ὀρινομένης ἁλὸς οἴδματι	Literal	‘n#06781694 a large body of water constituting a principal part of the hydrosphere’	25434
πόντῳ	Case=Dat\|Gender=Masc\|Number=Sing	πόντος	NOUN	ἠῶθεν δ᾽ Ὁμόλην αὐτοςχεδὸν εἰςορόωντες πόντῳ κεκλιμένην παρεμέτρεον	Literal	‘n#06781925 a division of an ocean or a large body of salt water partially enclosed by land’	25716
ἁλὸς	Case=Gen\|Gender=Fem\|Number=Sing	ἅλς	NOUN	λάρνακι δ᾽ ἐν κοίλῃ μιν ὕπερθ᾽ ἁλὸς ἧκε ϕέρεςθαι, αἴ κε ϕύγῃ	Literal	‘n#06781694 a large body of water constituting a principal part of the hydrosphere’	26911
πόντον	Case=Acc\|Gender=Masc\|Number=Sing	πόντος	NOUN	ἀλλὰ γὰρ ἔμπης ἦ θαμὰ δὴ πάπταινον ἐπὶ πλατὺν ὄμμαςι πόντον δείματι λευγαλέῳ, ὁπότε Θρήικες ἴαςιν	Metonymic	‘n#06783379 the part of the sea that can be seen from the shore’	27311
ἁλὶ	Case=Dat\|Gender=Fem\|Number=Sing	ἅλς	NOUN	περὶ γὰρ βαθυλήιος ἄλλων νήςων, Αἰγαίῃ ὅςαι εἰν ἁλὶ ναιετάουςιν	Metonymic	‘n#06781925 a division of an ocean or a large body of salt water partially enclosed by land’	36044	[‘Αἰγαίῃ’]	[‘n#06806923 an arm of the Mediterranean between Greece and Turkey; a main trade route for the ancient civilizations of Crete and Greece and Rome and Persia’]
ἀναπλώοντι	Case=Dat\|Gender=Masc\|Number=Sing\|Tense=Pres\|VerbForm=Part\|Voice=Act	ἀναπλέω	VERB	εἰ δ᾽ οὔ μοι πέπρωται ἐς Ἑλλάδα γαῖαν ἱκέςθαι τηλοῦ ἀναπλώοντι, ςὺ δ᾽ ἄρςενα παῖδα τέκηαι	Literal	‘v#01260993 travel by boat’	39277
ὕδωρ	Case=Acc\|Gender=Neut\|Number=Sing	ὕδωρ	NOUN	ἔνθ᾽ ἄρα τοίγε κόπτον ὕδωρ δολιχῇςιν ἐπικρατέως ἐλάτῃςιν	Literal	‘n#10771040 water containing salts’	39681
ἅλα	Case=Acc\|Gender=Fem\|Number=Sing	ἅλς	NOUN	ὄϕρα δαέντες ἀρρήτους ἀγανῇςι τελεςϕορίῃςι θέμιςτας ςωότεροι κρυόεςςαν ὑπεὶρ ἅλα ναυτίλλοιντο	Literal	‘n#06781694 a large body of water constituting a principal part of the hydrosphere’	39864
πόντου	Case=Gen\|Gender=Masc\|Number=Sing	πόντος	NOUN	κεῖθεν δ᾽ εἰρεςίῃ Μέλανος διὰ βένθεα πόντου ἱέμενοι τῇ μὲν Θρῃκῶν χθόνα, τῇ δὲ περαίην Ἴμβρον ἔχον καθύπερθε	Literal	‘n#06781925 a division of an ocean or a large body of salt water partially enclosed by land’	40063	[‘Μέλανος’]	[‘n#06810637 a sea between Europe and Asia; a popular resort area of eastern Europeans’]
πέλαγος	Case=Acc\|Gender=Neut\|Number=Sing	πέλαγος	NOUN	πέλαγος δὲ τὸ μὲν καθύπερθε λέλειπτο ἦρι	Literal	‘n#06783080 an especially deep part of a sea or ocean’	40294
ἅλα	Case=Acc\|Gender=Fem\|Number=Sing	ἅλς	NOUN	ἔςτι δέ τις αἰπεῖα Προποντίδος ἔνδοθι νῆςος τυτθὸν ἀπὸ Φρυγίης πολυληίου ἠπείροιο εἰς ἅλα κεκλιμένη	Literal	‘n#06781694 a large body of water constituting a principal part of the hydrosphere’	40710
ὕδατος	Case=Gen\|Gender=Neut\|Number=Sing	ὕδωρ	NOUN	ἐν δέ οἱ ἀκταὶ ἀμϕίδυμοι, κεῖνται δ᾽ ὑπὲρ ὕδατος Λἰςήποιο	Metonymic	‘n#06789983 a large natural stream of water (larger than a creek)’	40823
ἁλός	Case=Gen\|Gender=Fem\|Number=Sing	ἅλς	NOUN	ἠοῖ δ᾽ εἰςανέβαν μέγα Δίνδυμον, ὄϕρα καὶ αὐτοὶ θηήςαιντο πόρους κείνης ἁλός	Literal	‘n#06781925 a division of an ocean or a large body of salt water partially enclosed by land’	42805

Object name

25+ SEA words morpho-semantically annotated in Ancient Greek and Latin.

Format names and versions

CSV

Creation dates

From 2023-07-07 to 2023-08-10

Dataset creators

Andrea Farina (Department of Digital Humanities, King’s College London): conceptualization, data curation, methodology, formal analysis, data retrieval.

Language

Ancient Greek, Latin, English

License

CC0

Repository name

Figshare

Publication date

2023-08-18

(4) Reuse Potential

Given that this dataset describes the semantics of different words pertaining to the semantic field of SEA in Ancient Greek and Latin, its first reuse potential deals with linguistics. First, the dataset can lead to both onomasiological and semasiological analyses. It can be expanded considering other works, authors, and literary genres, to have a broader overview of SEA words in Ancient Greek and Latin. Similar datasets may also be obtained for other semantic fields and/or languages, to allow for cross-linguistic comparisons either synchronically or diachronically. Moreover, this dataset could serve as the basis to train a model for automatic semantic annotation based on co-occurring words, that can be extracted from the passage in which a token occurs.

This dataset – or other similar datasets – may also be employed in literary-geographical studies, to evaluate, for instance, how a specific place, such as a sea, is referred to in different texts and/or geographical areas – synchronically or diachronically –, and whether the proper noun of a sea tends to occur alone or with one or more common nouns. This may cast some new light on geographical denominations in the ancient world. In this sense, it may also be used to expand already existing online resources, such as Pelagios⁷ (Simon et al., 2012; Barker et al., 2016; Simon et al., 2016; Kahn et al., 2021; Vitale et al., 2021) or to add further historical depth to the World Historical Gazetteer⁸ (Manning & Mostern, 2015; Manning, 2015; Mostern, 2017), grouping together places that were called with more than one name.

Finally, more broadly, cross-linguistic analyses conducted in a cognitive framework also allow for psycho-anthropological studies that can address questions such as: How many words did the Greeks and the Romans possess to express one or more concepts related to SEA? How and why does the number of SEA words vary in Greek and Roman texts? How can we account for similarities and differences in this sense? Does this reveal anything about these populations from the cultural point of view?

Notes

[1] The results of this study were presented at ICHL26, the International Colloquium of Historical Linguistics (Heidelberg, Germany, 4–8 September 2023), by Andrea Farina, William Michael Short, and Barbara McGillivray.

[2] https://scaife.perseus.org (Last accessed: 27 October 2023).

[3] https://uima.apache.org/d/uimaj-current/references.html#ugr.ref.xmi (Last accessed: 27 October 2023).

[4] No occurrences for the AGr. counterpart paralía ‘shore’ were found in these texts.

[5] No occurrences of preverbed forms of Lat. navigo were found in these texts.

[6] No occurrences for Lat. pelagius ‘maritime, marine’ were found in these texts.

[7] https://pelagios.org (Last accessed: 27 October 2023).

[8] https://whgazetteer.org (Last accessed: 27 October 2023).

Acknowledgements

I would like to thank Dr Barbara McGillivray for her precious linguistic, computational, and stylistic feedback on this work. I also thank Paola Marongiu who read a preliminary version of the paper.

Funding Information

This dataset has been produced to present a paper at the International Colloquium of Historical Linguistics (see fn. 1), whose expenses were covered by the AHRC London Arts & Humanities Partnership.

Competing interests

I am guest editor of the special collection Representing the Ancient World through Data and social media manager of this journal and did not take part in the editorial process pertaining to this manuscript.