(1) Overview
Repository location
Context
In linguistics, the semantic field SEA has been studied in different ways, from the semantics of verbs of navigation (e.g. Maisak & Rakhilina, 2007; Divjak et al., 2010; Lander et al., 2012 in linguistic typology; Farina, 2021 on Ancient Greek) to metaphors connected to the sea (e.g. Leotta & De Felice, forth. 2023 on Latin). In Greek and Roman culture, the sea holds a prominent position, militarily (Harris, 2017; Nash, 2018), economically (Reed, 2003; Wilkinson, 2020; Boardman et al., 2021), and culturally (Berens, 1979; Lindenlauf, 2004; Nikoloska, 2012; Beaulieu, 2016).
This dataset contains linguistic information about more than 25 nouns, verbs, and adjectives connected to the semantic field SEA in four Ancient Greek and Latin texts between 5th – 1st century BCE (Lat. De Bello Gallico by Caesar, Aeneid 1–6 by Vergil; AGr. Histories 1–2 by Herodotus, Argonautica by Apollonius Rhodius).
The dataset has been created to support research on how the concept of SEA is lexicalized in Ancient Greek and Latin poetry and prose, with a case study on four authors.1
(2) Method
In this section, I summarize the steps that I followed to obtain the dataset presented here.
Steps
Text retrieval: after choosing the texts (see Section 1 and below), I downloaded them in .txt format from Perseus 5.0 – also called Scaife Viewer – of the Perseus Digital Library (Crane 1987; Crane et al. 2006).2
Text annotation: I then uploaded the texts on the annotation platform INCEpTION (Klie et al., 2018, then Boullosa et al., 2018; de Castilho et al., 2018a; de Castilho et al., 2018b; Klie, 2018; Klie et al., 2020), developed by the Ubiquitous Knowledge Processing (UKP) Lab at TU Darmstadt. I created my annotation tagsets and layers, based on the linguistic parameters that were of interest for my work, i.e. morphology, lemma, passage, semantics, meaning (literal, metaphorical, metonymic), relations with proper nouns (see Section 3 for a more detailed description). At the end of my annotation, I exported the data in the UIMA CAS XMI (XML 1.0) format.3
Data extraction and dataset creation: I used a Python script specifically designed for the UIMA framework to extract the annotated data. I created a dictionary based on token IDs where I mapped the annotation layers. I then exported the dataset resulting from this extraction in CSV format.
Sampling strategy
For this dataset, I decided to focus on two literary genres, i.e. historiography (Lat. De bello Gallico by Caesar; Gr. Histories 1–2 by Herodotus) and epic poetry (Lat. Aeneid 1–6 by Vergil; Gr. Argonautica by Apollonius Rhodius). Given that I also wanted to investigate the distribution of SEA words in Ancient Greek and Latin, I selected parts of these texts depending on the final number of tokens. To maintain a balance between the Latin and Greek sub-corpora, some texts (Herodotus’s Histories and Vergil’s Aeneid) have not been fully annotated. Overall, my corpus has 174,501 tokens. The Greek sub-corpus constitutes 53% of the whole corpus, and it has 92,592 tokens (53,750 for prose and 38,842 for poetry). The Latin sub-corpus has 81,909 tokens (51,313 for prose and 30,596 for poetry).
(3) Dataset Description
The nouns, verbs, and adjectives included in this dataset are:
NOUNS: AGr. thálassa, póntos, pélagos, háls, Lat. mare, pontus, pelagus, aequor ‘sea’; AGr. húdōr, Lat. aqua, lympha ‘water’; AGr. háls, Lat. sal ‘sea’, ‘salt’; AGr. kûma, Lat. unda, fluctus ‘wave’; Lat. litus, ripa ‘shore’;4
VERBS: AGr. pléō (and its preverbed forms occurring in the analyzed texts), Lat. navigo ‘sail’;5
ADJECTIVES: AGr. thalássios, póntios, Lat. marinus, maritimus ‘maritime, marine’.6
In the CSV file, annotations are represented with ten columns and as many rows as the number of SEA tokens in each of the considered texts. Columns provide: (1) the token (TOKEN); (2) its morphological analysis (MORPHOLOGICAL FEATURES); (3) its lemma (LEMMA); (4) its part of speech (POS); (5) the sentence in which the token is found (PASSAGE); (6) the type of token meaning (literal, metaphorical, or metonymic), according to cognitive linguistics and the new WordNets for ancient Indo-European languages (Biagetti et al. 2021) (MEANING); (7) its meaning in context using synsets from the WordNets, preceded by a unique identifier (SYNSET); (8) the token ID (ID); (9) possible words (proper nouns or adjectives) in Ancient Greek or Latin to which a noun meaning ‘sea’ is referred (REFERS TO); (10) the meaning of the phrase resulting from (1) and (9), using synsets from the WordNets, preceded by their unique identifier (DENOTES). An excerpt of the dataset is given in Table 1.
Table 1
An excerpt of the dataset (13 rows of Apollonius Rhodius’s Argonautica).
| TOKEN | MORPHOLOGICAL FEATURES | LEMMA | POS | PASSAGE | MEANING | SYNSET | ID | REFERS TO | DENOTES |
|---|---|---|---|---|---|---|---|---|---|
| ἁἁὸς | Case=Gen|Gender=Fem|Number=Sing | ἅλς | NOUN | ἔνθ᾽ ἄρα τοίγε ἑςπέριοι ἀνέμοιο παλιμπνοίῃςιν ἔκελςαν, καί μιν κυδαίνοντες ὑπὸ κνέϕας ἔντομα μήλων κεῖαν, ὀρινομένης ἁλὸς οἴδματι | Literal | ‘n#06781694 a large body of water constituting a principal part of the hydrosphere’ | 25434 | ||
| πόντῳ | Case=Dat|Gender=Masc|Number=Sing | πόντος | NOUN | ἠῶθεν δ᾽ Ὁμόλην αὐτοςχεδὸν εἰςορόωντες πόντῳ κεκλιμένην παρεμέτρεον | Literal | ‘n#06781925 a division of an ocean or a large body of salt water partially enclosed by land’ | 25716 | ||
| ἁλὸς | Case=Gen|Gender=Fem|Number=Sing | ἅλς | NOUN | λάρνακι δ᾽ ἐν κοίλῃ μιν ὕπερθ᾽ ἁλὸς ἧκε ϕέρεςθαι, αἴ κε ϕύγῃ | Literal | ‘n#06781694 a large body of water constituting a principal part of the hydrosphere’ | 26911 | ||
| πόντον | Case=Acc|Gender=Masc|Number=Sing | πόντος | NOUN | ἀλλὰ γὰρ ἔμπης ἦ θαμὰ δὴ πάπταινον ἐπὶ πλατὺν ὄμμαςι πόντον δείματι λευγαλέῳ, ὁπότε Θρήικες ἴαςιν | Metonymic | ‘n#06783379 the part of the sea that can be seen from the shore’ | 27311 | ||
| ἁλὶ | Case=Dat|Gender=Fem|Number=Sing | ἅλς | NOUN | περὶ γὰρ βαθυλήιος ἄλλων νήςων, Αἰγαίῃ ὅςαι εἰν ἁλὶ ναιετάουςιν | Metonymic | ‘n#06781925 a division of an ocean or a large body of salt water partially enclosed by land’ | 36044 | [‘Αἰγαίῃ’] | [‘n#06806923 an arm of the Mediterranean between Greece and Turkey; a main trade route for the ancient civilizations of Crete and Greece and Rome and Persia’] |
| ἀναπλώοντι | Case=Dat|Gender=Masc|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Act | ἀναπλέω | VERB | εἰ δ᾽ οὔ μοι πέπρωται ἐς Ἑλλάδα γαῖαν ἱκέςθαι τηλοῦ ἀναπλώοντι, ςὺ δ᾽ ἄρςενα παῖδα τέκηαι | Literal | ‘v#01260993 travel by boat’ | 39277 | ||
| ὕδωρ | Case=Acc|Gender=Neut|Number=Sing | ὕδωρ | NOUN | ἔνθ᾽ ἄρα τοίγε κόπτον ὕδωρ δολιχῇςιν ἐπικρατέως ἐλάτῃςιν | Literal | ‘n#10771040 water containing salts’ | 39681 | ||
| ἅλα | Case=Acc|Gender=Fem|Number=Sing | ἅλς | NOUN | ὄϕρα δαέντες ἀρρήτους ἀγανῇςι τελεςϕορίῃςι θέμιςτας ςωότεροι κρυόεςςαν ὑπεὶρ ἅλα ναυτίλλοιντο | Literal | ‘n#06781694 a large body of water constituting a principal part of the hydrosphere’ | 39864 | ||
| πόντου | Case=Gen|Gender=Masc|Number=Sing | πόντος | NOUN | κεῖθεν δ᾽ εἰρεςίῃ Μέλανος διὰ βένθεα πόντου ἱέμενοι τῇ μὲν Θρῃκῶν χθόνα, τῇ δὲ περαίην Ἴμβρον ἔχον καθύπερθε | Literal | ‘n#06781925 a division of an ocean or a large body of salt water partially enclosed by land’ | 40063 | [‘Μέλανος’] | [‘n#06810637 a sea between Europe and Asia; a popular resort area of eastern Europeans’] |
| πέλαγος | Case=Acc|Gender=Neut|Number=Sing | πέλαγος | NOUN | πέλαγος δὲ τὸ μὲν καθύπερθε λέλειπτο ἦρι | Literal | ‘n#06783080 an especially deep part of a sea or ocean’ | 40294 | ||
| ἅλα | Case=Acc|Gender=Fem|Number=Sing | ἅλς | NOUN | ἔςτι δέ τις αἰπεῖα Προποντίδος ἔνδοθι νῆςος τυτθὸν ἀπὸ Φρυγίης πολυληίου ἠπείροιο εἰς ἅλα κεκλιμένη | Literal | ‘n#06781694 a large body of water constituting a principal part of the hydrosphere’ | 40710 | ||
| ὕδατος | Case=Gen|Gender=Neut|Number=Sing | ὕδωρ | NOUN | ἐν δέ οἱ ἀκταὶ ἀμϕίδυμοι, κεῖνται δ᾽ ὑπὲρ ὕδατος Λἰςήποιο | Metonymic | ‘n#06789983 a large natural stream of water (larger than a creek)’ | 40823 | ||
| ἁλός | Case=Gen|Gender=Fem|Number=Sing | ἅλς | NOUN | ἠοῖ δ᾽ εἰςανέβαν μέγα Δίνδυμον, ὄϕρα καὶ αὐτοὶ θηήςαιντο πόρους κείνης ἁλός | Literal | ‘n#06781925 a division of an ocean or a large body of salt water partially enclosed by land’ | 42805 |
Object name
25+ SEA words morpho-semantically annotated in Ancient Greek and Latin.
Format names and versions
CSV
Creation dates
From 2023-07-07 to 2023-08-10
Dataset creators
Andrea Farina (Department of Digital Humanities, King’s College London): conceptualization, data curation, methodology, formal analysis, data retrieval.
Language
Ancient Greek, Latin, English
License
CC0
Repository name
Figshare
Publication date
2023-08-18
(4) Reuse Potential
Given that this dataset describes the semantics of different words pertaining to the semantic field of SEA in Ancient Greek and Latin, its first reuse potential deals with linguistics. First, the dataset can lead to both onomasiological and semasiological analyses. It can be expanded considering other works, authors, and literary genres, to have a broader overview of SEA words in Ancient Greek and Latin. Similar datasets may also be obtained for other semantic fields and/or languages, to allow for cross-linguistic comparisons either synchronically or diachronically. Moreover, this dataset could serve as the basis to train a model for automatic semantic annotation based on co-occurring words, that can be extracted from the passage in which a token occurs.
This dataset – or other similar datasets – may also be employed in literary-geographical studies, to evaluate, for instance, how a specific place, such as a sea, is referred to in different texts and/or geographical areas – synchronically or diachronically –, and whether the proper noun of a sea tends to occur alone or with one or more common nouns. This may cast some new light on geographical denominations in the ancient world. In this sense, it may also be used to expand already existing online resources, such as Pelagios7 (Simon et al., 2012; Barker et al., 2016; Simon et al., 2016; Kahn et al., 2021; Vitale et al., 2021) or to add further historical depth to the World Historical Gazetteer8 (Manning & Mostern, 2015; Manning, 2015; Mostern, 2017), grouping together places that were called with more than one name.
Finally, more broadly, cross-linguistic analyses conducted in a cognitive framework also allow for psycho-anthropological studies that can address questions such as: How many words did the Greeks and the Romans possess to express one or more concepts related to SEA? How and why does the number of SEA words vary in Greek and Roman texts? How can we account for similarities and differences in this sense? Does this reveal anything about these populations from the cultural point of view?
Notes
[1] The results of this study were presented at ICHL26, the International Colloquium of Historical Linguistics (Heidelberg, Germany, 4–8 September 2023), by Andrea Farina, William Michael Short, and Barbara McGillivray.
[2] https://scaife.perseus.org (Last accessed: 27 October 2023).
[3] https://uima.apache.org/d/uimaj-current/references.html#ugr.ref.xmi (Last accessed: 27 October 2023).
[7] https://pelagios.org (Last accessed: 27 October 2023).
[8] https://whgazetteer.org (Last accessed: 27 October 2023).
Acknowledgements
I would like to thank Dr Barbara McGillivray for her precious linguistic, computational, and stylistic feedback on this work. I also thank Paola Marongiu who read a preliminary version of the paper.
Funding Information
This dataset has been produced to present a paper at the International Colloquium of Historical Linguistics (see fn. 1), whose expenses were covered by the AHRC London Arts & Humanities Partnership.
Competing interests
I am guest editor of the special collection Representing the Ancient World through Data and social media manager of this journal and did not take part in the editorial process pertaining to this manuscript.
