Have a personal or library account? Click to login
Era- and Genre-Specific Stop Word Lists for Low-Resource Computational Research: A Classical Latin Exemplum Cover

Era- and Genre-Specific Stop Word Lists for Low-Resource Computational Research: A Classical Latin Exemplum

Open Access
|Nov 2024

Figures & Tables

Table 1

Latin literature, although often used as a single corpus in computational research, in fact has a wide generic range and multi-millennial time span (See Conte 1994).

TIME PERIODREPUBLICAN LATINAUGUSTAN LATINIMPERIAL LATINLATE ANTIQUE & CHRISTIAN LATIN
‘Classical Latin’
Approximate Start2nd C. BCEmid-1st C. BCE1st C. CE2nd C. CE
Approximate Endmid-1st C. BCEearly 1st C. CE3rd C. CE
Example Authors & Works by Genre
Epic poetryLucretiusVergil
Ovid
Statius
Lucan
Dracontius
TheaterPlautus
Terence
PantomimeSenecaHrotsvitha
Small-format poetry (e.g., elegy)CatullusHorace
Propertius
Tibullus
Ovid
Statius
Martial
Anthologia Latina
Historiography & Military HistoryCaesar
Sallust
LivyTacitusAmmianus Marcellinus
Other proseCicero
Varro
Atticus
Flaccus
Vitruvius
Pompey Trogue
Petronius
Pliny
Quintilian
Suetonius
Apuleius
Nonius
Jerome
Tertullian
Other apologists Acta Martyrum and Passiones
Table 2

Summary of words included in existing Latin stop lists. “Y” indicates inclusion on a list; “N” indicates absence. CLTK-M: stop words by mean frequency; CLTK-V: stop words by variance; CLTK-E: stop words by entropy probability; CLTK-B: stop words by Borda count; ISO: stop words from the International Standardization Organization; PDL: stop words from the Perseus Digital Library. More details available in the file previous_stop_lists.py.

WORDSCLTK-MCLTK-VCLTK-ECLTK-BISOPDL
adhic, aliqui, aliquis, an, cur, deinde, es, etsi, fio, haud, idem, infra, interim, is, mox, necque, o, ob, possum, quare, quicumque, quilibet, quisnam, quisquam, quisque, quisquis, quoniam, sive, sui, sum, suus, trans, tum, unusNNNNNY
a, e, erant, re, rebus, rem, tandem, velNNNNYN
atNNNNYY
contra, cuius, tantumNNYNNN
magisNNYNNY
anno, deo, dicitur, dixit, dominus, ed, nummus, rex, totusNYNNNN
superNYNNNY
bellum, bibit, dig, nouus, od, quaestio, uosNYNYNN
eorumYNNNNN
cui, omnibus, suaYNYNNN
apud, igiturYNYNNY
resYNYNYN
ei, nobis, omnes, potest, quos, sineYNYYNN
modo, quis, tam, ubiYNYYNY
dei, deus, secundumYYNYNN
ea, eius, eo, esse, esset, eum, fuit, his, id, illa, mihi, nihil, nunc, omnia, quem, quid, quoque, se, sibi, sicut, sit, tibiYYYYNN
ante, ego, enim, ergo, iam, ille, inter, ipse, nam, ne, nisi, nos, post, pro, quia, sub, tu, uel, ueroYYYYNY
erat, haec, hoc, me, qua, quibus, quod, sunt, teYYYYYN
ab, ac, ad, atque, aut, autem, cum, de, dum, est, et, etiam, ex, hic, in, ita, nec, neque, non, per, quae, quam, qui, quidem, quo, sed, si, sic, tamen, utYYYYYY
Table 3

Stop list targeted to Classical Latin poetry.

CLASSICAL LATIN POETRY STOP LIST
acontraexitanosterquaquodsuper
abcumfaciomagisnuncquamquoquetam
absdeferomagnusnullusquerestamen
acdeushabeomeusobquisetantus
addicohicmodoomnisquiasedtotus
aliquisdoiammultusperquicumquesitu
aliusdumidemnampostquidsictuus
anteeigiturnepossumquidamsicutvel
apudegoillenecpraeterquidemsinevero
ateniminnequeproquissubvester
atqueergointernihilpropequisquamsuivos
autetintranisipropterquisquesumubi
autemetiamipsenosqualisquisquissupraultra
circaetiamnumisnonquantusquosuusut
uti
FILE NAMEFORMATSDESCRIPTION
classical_latin_poetry_ stop_words_unlemmatized.txt,* .csvUnlemmatized stop list for classical Latin poetry
classical_latin_poetry_ stop_words_lemmatized.txt,* .csvLemmatized stop list for classical Latin poetry
latin_stop_list.pyPython script for importing stop lists
DOI: https://doi.org/10.5334/johd.246 | Journal eISSN: 2059-481X
Language: English
Submitted on: Sep 28, 2024
Accepted on: Nov 12, 2024
Published on: Nov 25, 2024
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2024 Rachel E. Dubit, Annie K. Lamar, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.