Have a personal or library account? Click to login
Linked Data, Fragmented Knowledge: Towards a Digital Quellenkritik of Verrius Flaccus’ Lexicon De uerborum significatu Cover

Linked Data, Fragmented Knowledge: Towards a Digital Quellenkritik of Verrius Flaccus’ Lexicon De uerborum significatu

By: Stephen Blair  
Open Access
|Jan 2026

Full Article

(1) Context and purpose

Though the study of the ancient world is currently undergoing a “digital turn”,1 scholarship in classical philology is still mostly done by methods of textual analysis developed and refined over the nineteenth and twentieth centuries. The potential of digital methods to advance scholarly research on ancient philological questions is only gradually entering the academic mainstream, and much of the field remains open, awaiting exploration.

Among the digital tools humanities scholars have shown interest in exploring, Wikidata stands out both for its tantalizing potential to integrate interdisciplinary knowledge into a linked, free, semantically expressive, human- and machine-readable knowledge base, and for its obvious shortcomings as a source. Specialists in a given field typically find the information in Wikidata broad, shallow, and generic, with a strong bias towards the recent and the well-known (Cook, 2017). Of course, a knowledge base like Wikidata, collaboratively built by a wide range of users with different priorities, worldviews, idiosyncrasies, and levels of expertise, will inevitably include some lopsided, sloppy, inconsistent, and inaccurate data: this inherent feature of mass collaboration necessitates a contextualizing “trust layer” between the database and the implementation of its content, tailored to each project’s aims, to make Wikidata usable for scholarly purposes (Santos et al., 2024).

Yet, despite the limitations of the current knowledge base for specialist purposes, Zhao’s (2023) recent “systematic review” of the uses of Wikidata by digital humanities projects found that humanities scholars involving Wikidata in their work primarily draw on it as a “content provider”; attempts to expand Wikidata’s humanities coverage collaboratively by integrating higher-quality specialist data sets into it, or to use Wikidata as a venue for publication and dissemination of data, are markedly rarer.

Given the nature of the data, it is perhaps unsurprising that lexicography stands out as an exceptionally fruitful sub-field among humanities researchers for exploring the potential of linked open data. The OntoLex lexicography module (lexicog) has facilitated new work on linked digital lexicography and has been shown to be adaptable enough to meet the needs of various types of lexica (Bellandi, 2025), though Wills (2021) has raised doubts about the efficacy of OntoLex in representing the data contained in historical lexica, which depend overwhelmingly on references to editions or manuscripts of early texts. Latin lexicography in particular has fortunately received a great deal of theoretical as well as practical attention. Lindemann et al. (2023) offer a useful overview of the modelling challenges inherent in using OntoLex to represent Latin lexicographical data in Wikibase. The Linking Latin (LiLa) project2 provides an interoperable knowledge base for Latin lexicography, incorporating data from the Lewis & Short Latin dictionary (Mambrini et al., 2022) and linked by a Wikidata property to lexicographical entities in Wikidata (Mambrini & Passarotti, 2023). This already robust structure can be profitably expanded by representing data from ancient lexicographical texts in Wikidata’s knowledge base with important potential consequences for philological research.

The present article is a methodological reflection on the utility of Wikidata for the analysis of a specific body of ancient textual evidence, aiming to address both the difficulty of profitably applying digital tools to problems of classical philology and the preference among scholars of digital humanities for drawing on Wikidata in research contexts rather than augmenting it. Taking as a case study some problems relating to the sources of the Latin lexicon De uerborum significatu (“On the Meaning of Words”) by the Augustan-era lexicographer M. Verrius Flaccus, the following discussion suggests that enriching the Wikidata knowledge base with strategically structured data drawn from the transmitted remains of Verrius’ lexicon would serve two purposes: 1) Since Verrius’ lexicon is an invaluable witness to many realia of the ancient world and preserves fragments of authors whose works are otherwise mostly or entirely unknown, linking relevant portions of Verrius’ text to Wikidata items corresponding to the figures and phenomena he attests will raise the quality of the knowledge base and improve referencing for a wide swath of information on ancient history and Latin literature. 2) Enabling queries in SPARQL Protocol and RDF Query Language (SPARQL) of a structured and semantically linked version of Verrius’ text through the Wikidata Query Service can reveal significant patterns that provide important insights into Verrius’ use of sources, which human source-critics applying similar strategies manually may have missed.

(2) Description of the lexicon and its transmission

Verrius Flaccus, a formerly enslaved ancient Roman scholar and antiquarian active under Augustus and Tiberius, was one of the most significant intellectuals of his time: having acquired a reputation for teaching, he was personally chosen by Augustus to tutor the princes, relocated to an office in the imperial palace, and paid a lavish salary.3 Alongside other scholarly and philological works and an annotated calendar inscribed on marble (some epigraphic fragments of which survive), Verrius’ crowning intellectual achievement was his monumental Latin encyclopaedia “On the Meaning of Words” (De uerborum significatu). Despite its title, the text was much more than a lexicon: it included a wealth of scholarly information on the development of the Latin language, Greek and Latin literature, Roman and Etruscan religion, cultural history, law, and much more, all organized into discrete lexicographical entries arranged in alphabetical order.4 The scale of the text was massive: though we do not know how long the original was, we know that the entries beginning with “A” filled at least four papyrus rolls, and entries beginning with “P” at least five. It was also exhaustively researched: Verrius’ lexicon cited hundreds of literary and scholarly texts, most of which have not been preserved intact. Verrius’ work was an unparalleled scholarly achievement and an invaluable resource, both for the details it recorded on the realities of the ancient world, and for the wealth of citations it preserved from ancient texts which have since been lost.

No manuscript of Verrius’ original text survives. This is among the most lamentable losses for our understanding of ancient antiquarian scholarship. But around the second century CE, an otherwise unknown scholar called Sextus Pompeius Festus undertook to abridge Verrius’ massive text, dramatically cutting it down to a mere 20 volumes. A single manuscript containing part of Festus’ abridgement of Verrius’ lexicon (from mid-M through mid-V) has survived, the so-called “Farnesianus”, currently in the Biblioteca Nazionale in Naples.5 The Farnesianus manuscript is badly damaged by fire: each page contains two columns of text, the outer column of which is mostly destroyed, so that only about half of the columns represented in the already partial manuscript are intact. When the manuscript surfaced in the early modern period, enthusiastic humanists took it apart and lost even more of it, though before doing so, some made copies (known as “apographs”) which can be used to reconstruct the more recently lost portions.

This manuscript is our only copy of Festus’ abridgement, incomplete and badly damaged. But an even shorter abridgement of Festus’ abridgement was made by Paul the Deacon at the court of Charlemagne. Comparison of Paul with the extant bits of Festus shows that Paul abbreviated ruthlessly, cutting out much of the rich detail, and many of the important citations of lost authors, which Festus had preserved from Verrius’ original. This third and shortest version of the lexicon by Paul survives intact.

We can thus read Verrius’ lexicon only at second hand (in incomplete and mutilated form) or at third. For the portions of the alphabet not covered in the partial Farnesianus, we must rely on Paul alone: where Paul and Festus overlap, Festus is superior in detail and provides more and better information, but Paul can sometimes be used to reconstruct the meaning of the damaged and illegible bits of Festus. Even in spite of dramatic reductions, these surviving abridgments of Verrius’ work fill over 500 pages in a print edition (Lindsay, 1913) and are an invaluable resource for modern scholars of the ancient world.

The rich data preserved in Festus and Paul has never been published in a structured form; this is a gap waiting to be filled. Encouragingly, attempts to digitize other works of ancient lexicographical scholarship and to integrate their content using the tools of linked open data are currently under way, though still inchoative. These projects may provide a methodological precedent for the work remaining to be done on Verrius. Already in 1998, a team of scholars conceived the idea to publish an online collaborative edition of the Suda (an alphabetically arranged Byzantine encyclopaedia of literature, history, and realia) with translation, commentary, and structuring of the alphabetical entries into thematic categories (Finkel et al., 2000; Mahoney, 2009; “History”, 2019). The project was published in 2014 and remains a significant early milestone in the digital study of ancient lexicography. Work to integrate the content of the Suda into the epistemic landscape of linked open data is currently in progress by the Linked Ancient Greek and Latin project under the leadership of Monica Berti at Leipzig University. The project uses semi-automatic methods to extract data on named entities (cited ancient authors, historical figures, ultimately place names) from the Suda and other texts, which are then linked with corresponding entities in external knowledge graphs (Berti, 2023, 2025). Though research is still ongoing, a preliminary catalogue of authors cited in the opening portion of the Suda is already available.6

The important work currently being done on the Suda will hopefully point the way forward for future integrations of ancient lexicographical context with linked open data. As a first step, extraction of named entities from Verrius, as Berti is currently doing for the Suda and other ancient corpora, would already provide a wealth of information to enrich the Wikidata knowledge base. Much information on ancient named entities occurs in the epitomes of Verrius and nowhere else. For example, Festus is the only ancient source to report the cause of death (P509) of the famous painter Zeuxis (Q197044), who painted a picture of an old woman which was so funny that he laughed himself to death by looking at it (Festus, p. 228 ll. 10–11).7 The valuable and rare information Verrius provides on systems of ancient knowledge goes well beyond named entities: integrating this material into Wikidata would simultaneously improve the Wikidata knowledge base and greatly facilitate navigation within and across ancient lexica for the study of ancient texts and the ancient world. For example, Festus and Paul report many entries on the various subclasses (P279) of lightning (Q33741) studied by the field of usage (P9488) of Etruscan divinatory science (Q3059369). Expanding the Wikidata knowledge base with items attested by Verrius and anchored to references within the lexicon would enable source-critical research based on correspondences within Verrius’ text (on which see below), study of the practices of ancient scholarship in general (by linking Verrian material to analogues in other ancient antiquarian texts), and interdisciplinary research on elements of the ancient world whose study is not exclusively text-focused (such as Etruscan religion).

(3) Method

(3.1) Wikidata and lexicographical works

Here a problem arises concerning the most profitable way of representing Verrius’ lexicographical content in Wikidata. Though the entries in De uerborum significatu span a wide range of disciplines, many of them are indeed strictly lexical. As the title indicates, a major, though not the only, aim of the text is to be a reference work for the study of lexemes within the Latin language. Entries in Verrius’ text with a lexical focus include a headword followed by any or all of the following: glosses, etymologies, references to other grammatical sources, quotations from ancient literary authors illustrating the use of the lexeme in context, statements describing unusual grammatical features, etc. All of this information could be unambiguously represented using Wikidata’s suggested best practices for entering lexicographical data in the form of “lexemes”: entities within the knowledge base whose identifier begins with an L, and which are distinct from Q-items in that they refer not to concepts that exist in the world independent of language, but to the linguistic apparatus used to describe the world of concepts.8 Lexemes can be linked to Q-items using statements. Statements pertaining to a lexeme can be referred to lexicographical sources either by creating a unique Wikidata property to link a lexeme to its identifier in a particular lexicographical work (Q56216056), or by using the property indicating description by a source (P1343).9

Yet, despite the formal compatibility between Verrius’ lexicon and a modern one, there are clear reasons for treating works of ancient scholarship by different standards. Ancient lexica occupy a curious middle ground, as they partially share in the epistemic patterns and forms of knowledge production which characterize modern “secondary” literature, while also operating according to pre-modern research conventions and simultaneously serving as primary witnesses to the ancient world. Wikidata’s documentation says sources must be “serious” and “reliable”;10 is Verrius a serious and reliable lexicographer? To paraphrase a famous article on the fundamental differences between ancient and modern research practices (Loraux, 1980): Verrius is not a colleague.

Rather than translate Verrius’ entries into Wikidata lexemes, I suggest that a more profitable method of representing Verrian data in the Wikidata knowledge base is to treat each entry of De uerborum significatu as an independent item with its own Q-identifier, an instance of the Wikidata concept for entries in reference works (Q3055347). This would both avoid the conflation of ancient and modern research standards and would enable new source-critical insights which only the querying of Verrian entries as independent entities will allow (on this see below).

There is already precedent for this sort of granular subdivision of items representing ancient texts in the knowledge base. The Iliad (Q8275), for example, coexists in Wikidata alongside its twenty-four component “books” and even the proem, each of which have their own unique identifiers, linked in both directions to the whole text with the “part of” (P361) and “has part(s)” (P527) properties. The entire corpus of Horace’s Odes has a unique identifier (Q943884), as does each of Horace’s four books of Odes (Q106096014, Q106096015, Q106096017, Q106096016), as does the 37th poem of the first book (Q108274144).

There is also precedent for representing the content of historical lexica with individual Q-items corresponding to discrete entries. Each of the entries in the Diccionario de Arquitectura Civil (Q19430752), published in Madrid in 1802, appears in Wikidata as a unique item and an instance of “reference work entry” (Q3055347). Q19423512, for example, is the entry for cornijamento (found on p. 28 col. 2 of the print edition), which glosses the term as identical to cornisamento. The Wikidata item is linked to the full text of the entry on Wikisource, where it is likewise treated as a discrete unit.11 This same procedure applied to Verrius would be especially fruitful given the circumstances of the text’s composition and transmission.

(3.2) Source criticism, old and new

As mentioned above, the abridgements of Verrius’ text are valuable not only for the light they shed on various features of the ancient world, but also for the many precious citations of now-lost ancient authors that Verrius has rescued from oblivion. Unfortunately, most of the text as it is accessible to us has gone through two abridgements, and many valuable quotations perished on the cutting-room floor. Strzelecki’s (1932) comparison of Paul with the extant corresponding passages of Festus showed that in the abridgement process Paul omitted many valuable quotations from authorities still cited in Festus; Paul was particularly quick to delete quotations of prose authors. We can only assume that Festus, in abridging Verrius, took a similar approach. The text as it stands, particularly the portions represented only by the shorter version of Paul, are repositories of information Verrius had meticulously researched and carefully attributed to hundreds of source texts, but many of the attributions were deleted in the epitomizing process, leaving learned discussions in want of a citation.

Classical scholarship of the nineteenth century was, however, often confident—sometimes too confident—in its ability to detect the source lurking behind an ancient author’s text even in the absence of an explicit citation. The methods of “source criticism” (Quellenkritik, or Quellenforschung) were often rashly applied by scholars who treated ancient authors as mere copyists, plagiarizing what they had read with no attempt or ability to synthesize, analyse, and adapt.12 Fortunately, modern scholars are more circumspect in their applications of source criticism, which, applied with caution, can reveal significant patterns that shed light on the composition of a text. This is particularly true in the case of scholarly works such as lexica, which are more compilatory in nature than other ancient genres of writing, and which often bear more visible traces of work by multiple hands. Older source-critical scholarship on Verrius vastly enriched our understanding of the text by identifying thematic foci, verbal tics, and other patterns among the entries which may point to a common source lurking behind a given group (Reitzenstein, 1887; Strzelecki, 1932; Bona, 1964, 1982).

To return to an example used above, Verrius reports a great deal of information on subclasses of lightning as they were interpreted within the Etruscan system of divination. Reitzenstein (1887) was the first to call attention to a particular passage of Festus which describes three types of lightning and their correct interpretation in the following order: “postulatory lightning” (postularia fulgura) signifies that a religious rite has been performed inadequately and must be corrected; “baleful lightning” (pestifera) foretells death or exile; and “cancelling lightning” (peremptalia) negates the meaning of a previous lightning strike (Festus, p. 284 ll. 9–13). No source is cited for this information. But Reitzenstein noticed that these same three specialized types of lightning are described, in the same order and in almost exactly the same language, in a different ancient text (Seneca, Naturales Quaestiones 49.1–2), where this recondite information is explicitly attributed to the Roman scholar Aulus Caecina (Q772539). Caecina is elsewhere attested as a respected author specializing in Etruscan religion. This makes it overwhelmingly likely that the anonymous material in Festus and Paul derives from Caecina as well, and leads to the important conclusion that Caecina was among Verrius’ sources for Etruscan religion. One can further speculate whether other entries on types of lightning in Festus and Paul, if they resemble these in form, focus, tone, or vocabulary, may have likewise come from the same source.

Reitzenstein (1887) and Strzelecki (1932) further showed that Verrius’ sources appear in “clusters”: when a source is cited in a given entry, it frequently appears in the entries immediately preceding or following, especially in the less carefully alphabetized portions towards the end of a given letter (Müller, 1839). This is highly suggestive of a “data collection phase”, during which Verrius was casting dozens of new entries from research into a given source, and appending new entries to the end of a given letter’s material.

All of this demonstrates the important insights to be gained into the construction of ancient lexica and the sources and methods of ancient scholars by linking discrete lemmatic treatments of a given concept to related treatments within and beyond the text. Wikidata offers a robust and expressive knowledge base whose concepts pertaining to the study of the ancient world can be linked to the ancient lexicographical units that describe them. This would be doubly beneficial, both to Wikidata and to the field of classical philology. On the one hand, bolstering Wikidata’s coverage of niche topics like Etruscan divination with the vast primary source material contained in Festus and Paul would go a long way to deepening and enriching the knowledge base in this single specialist area. On the other, linking the informational tidbits transmitted by Verrius and other ancient lexicographers to existing or newly created Wikidata concepts would enable scholars to detect patterns, traces of lost sources, and intertextual correspondences by SPARQL query—correspondences which, so far, have only been visible to those scholars whose education and memory are so prodigious that they happen, by coincidence, to notice them.

Representing Verrian information in Wikidata would also enhance the lexicographical data already present through links to the LiLa knowledge base, particularly in the case of poorly attested lexemes. Since ancient lexicographers tend to focus on rare and obsolete words, it frequently happens that Verrius is our first or only attested usage of a given term for which a Wikidata lexeme already exists. In such cases it is important to link a lexical entity to the earliest or only ancient source that attests it. For example, the Wikidata lexeme mamphula (L1057142), a Latin noun referring to a type of Syrian bread, is linked by the property P11033 (LiLa Linking Latin URI) to the corresponding LiLa lemma 111443. This word is attested nowhere else in classical Latin apart from Verrius, and the fragment of the satirist Lucilius which Verrius quotes to illustrate the meaning of the term. Linking Verrian entries with a semantic focus to their corresponding lexemes in Wikidata will help to enrich strictly lexicographical data by embedding them within the deeper cultural and historical context ancient lexicographers describe (Figure 1).

johd-12-455-g1.png
Figure 1

A typical Verrian entry and a suggestion for representing it with Wikidata properties and concepts.

Finally, as the full transmitted text of the epitomes of Verrius has never been translated into English or any other modern language,13 supplementing the Wikidata knowledge base with human- and machine-readable statements derived from Verrius’ entries will make this valuable material accessible to a new generation of students and scholars and exploit the underused potential of Wikidata as a venue for the publication of research (Zhao, 2023).

(4) Challenges

Adding the wealth of information contained in works of ancient lexicography to the Wikidata knowledge base would not be without challenges. Perhaps the most serious of these is the representation of uncertainty. As the above discussion shows, almost every aspect of the study of Verrius’ text and its transmission is shrouded in doubt and must be approached with care and a thorough understanding of all contextual problems. To name just a few of the interpretative problems: 1) Some of the source-critical conclusions that can be drawn by observing patterns and tendencies in the text are probable, others merely represent one possibility among many, still others are extremely speculative. It is of paramount importance to express to users how firm or shaky are the supports on which a given attribution rests. 2) The fire damage to the Farnesianus renders nearly half of it illegible, with the result that we can only guess, with greater or lesser confidence, at the content of the many burned columns. 3) Even where the lexicon clearly and legibly attributes information to a named source, we are often uncertain which person is meant. A citation of “Cincius” may refer to the historian L. Cincius Alimentus (Q867394) or the antiquarian L. Cincius (Q2855302); “Ateius” may be Ateius Capito (Q727589) or Ateius Philologus (Q2868804), etc. 4) Though it is generally agreed that Festus and Paul were, above all, faithfully abbreviating and almost never adding to Verrius’ text, there are a few visible cases where post-Verrian material was certainly inserted by one of the two epitomizers. These include a quotation of St. Paul the apostle’s Letter to the Romans (p. 32 l. 16) and one from Ovid (p. 437 l. 7), which Paul added to Festus; and two quotations in Paul from post-Verrian texts (p. 506 l. 20; p. 31 l. 14), which may have been added by either Festus or Paul (on the extent of their interventions see Reitzenstein, 1887; Strzelecki, 1932). These few aberrations are exceptions to the rule, but we still cannot be entirely certain that a given entry in Festus or Paul is of genuine Verrian provenance rather than a later creation.

Wikidata is well equipped with a variety of tools to represent a statement’s uncertainty or disputedness. In a recent study of Wikidata’s handling of claims of “weaker logical status” in data from both the humanities and sciences, Di Pasquale et al. (2024) identified four main strategies for representing doubt, uncertainty, or controversy in Wikidata: 1) statements present in the knowledge base but not asserted, 2) ranked statements, 3) values indicating nonexistence or uncertainty (“someValue” and “noValue”), and 4) contextualizing qualifiers such as “sourcing circumstances” (P1480) or “nature of statement” (P5102). However, their study found that the use of Wikidata’s infrastructure to represent uncertainty is very poor and inconsistent in the humanities data surveyed. For example, only 0.4% of visual artworks in Wikidata report any uncertainty or disagreement whatsoever in the attribution—a suspiciously low figure by comparison with the database of the RKD – Nederlands Instituut voor Kunstgeschiedenis, where the comparable figure is 8.5%. The authors suggest that Wikidata’s greatly insufficient representation of the disputedness of facts may be due to the non-user-friendly modes of representing this information, and offer proposals for streamlining the process—including a certainty marker on a scale of 1 to 5, or 1 or 7. Even this criterion, being merely quantitative, would probably be insufficiently expressive to convey the various types of and reasons for uncertainty in problems of ancient lexicography. Without a doubt, a rigorously regulated and consistent set of best practices for representing scholarly doubt in humanities data would be a prerequisite for profitably expressing ancient lexicographical knowledge in Wikidata.

A further difficulty involved is the amount of human labour required to make the information contained in a print edition of Festus and Paul accessible via Wikidata. Given Wikidata’s bias towards mainstream knowledge in the humanities (Cook, 2017; Zhao, 2023), niche projects are particularly well poised to improve and expand Wikidata and may have more to gain from integration of their own specialist data with the wider knowledge base; it is, however, precisely these sorts of niche humanities projects which are likely to struggle to find collaborators or institutional funding. (It is worth remembering that the Suda On Line was prepared by a team of professional scholars who worked for 16 years between the conception of a digital edition and its publication.)

The critical edition by Lindsay (1913) is in the public domain, and the complete text of Paul’s shorter and much compressed version is part of the LiLa knowledge base.14 The richer and more detailed text of Festus is also available in digital form,15 but its plain-text version presents a series of problems. Valuable information has been lost in the process of digitizing Festus in several ways. First, Lindsay’s edition employed the space on the printed page to represent visually how and where the Farnesianus had been damaged, information which is lost in the transition to plain text. Second, editions of ancient texts are supplied with a highly compressed set of text-critical annotations called a “critical apparatus” (Q1665273), which alerts the reader to disagreements and variations in the manuscripts consulted by the editor in the preparation of the text, as well as other textual emendations or supplements proposed by the current editor or previous scholars: this is an indispensable tool for serious research into any ancient text whose transmission is in any way problematic. Although divergent readings in various manuscripts and proposals in scholarly publications can easily be expressed using existing Wikidata properties, representing the bulk of information contained in the critical apparatus on Wikidata would require an enormous expenditure of human labour, probably with very little benefit.

With the text prepared, a substantial amount of human labour would be required to represent a significant portion of its content intelligibly in Wikidata. A recent attempt to automate the representation of Old French lexico-semantic data in DBpedia showed promising results (Tittel, 2023). Again, Berti’s Suda project may point the way forward by initially focusing on named entity recognition, which lends itself more readily to automation; this, obviously, would have to be verified by human researchers, and in the case of Verrius would represent a relatively small portion of his scholarly breadth. But a systematic catalogue of historical figures and ancient sources named or cited by Verrius would already deeply enrich Wikidata’s knowledge of these characters and facilitate new research into ancient lexicographers’ working methods and intellectual world.

(5) Conclusion

A closer collaboration between Wikidata advocates and classical philologists working on ancient lexica would be mutually beneficial. On the one hand, supplementing the Wikidata knowledge base with high-quality specialist data will help to fill its gaps, correct its errors, and mitigate its bias towards the recent and well-known. On the other hand, linking the data expressed in the epitomes of Verrius to Wikidata concepts and other ancient treatments of them will likely reveal important correspondences human scholars may have missed, particularly towards a better understanding of ancient lexicographers’ sources and working methods. Two significant challenges remain to be negotiated before Wikidata can be effectively used for the study of ancient lexicography: 1) Wikidata’s inconsistent and insufficient use of extant markers of uncertainty and disputedness could lead to misleading information in the knowledge base, unless extreme care is taken to regulate and standardize best practices for clearly indicating scholarly disagreement and tentative speculation on problems of classical philology; and 2) the messy and irregular nature of the data would require a considerable amount of human labour and resources, which may not always be judged worthwhile in the case of niche research projects. On the second point, the current project on linked open data and the Suda, by focusing on semi-automatic extraction of named entities, is a promising example pointing the way forward. All in all, the potential offered by linked open data both for original research into ancient intellectual activity and for representing specialist knowledge to a broader audience has barely begun to be explored.

Notes

[1] Amid the increasingly prolific literature see e.g. Chronopoulos et al., 2020, and a recent special collection of this journal (https://openhumanitiesdata.metajnl.com/collections/representing-the-ancient-world, last accessed 30 December 2025).

[2] https://lila-erc.eu/ (last accessed 30 December 2025).

[3] Biographical details reported by Suetonius, De grammaticis et rhetoribus 17.

[4] I use the term “entries” to avoid any confusion that may arise from the distinct uses of the term “lemma” in ancient philology and Wikidata: while ancient philologists use the term “lemma” to refer to the entirety of an ancient lexicographer’s entry organized under a given headword, Wikidata documentation uses “lemma” for the headword itself (see https://www.wikidata.org/wiki/Wikidata:Lexicographical_data/Documentation, last accessed 30 December 2025).

[5] IV.A.3 in the Naples library catalogue. Thewrewk de Ponor’s (1893) facsimile edition of the remaining portions of the manuscript is a valuable resource: https://doi.org/10.18452/420.

[6] https://www.lagl.org/tools/suda/ (last accessed 30 December 2025).

[7] All passages in the texts of Festus and Paul are cited by page and line number in the standard edition by Lindsay (1913).

[8] See Wikidata’s own documentation on the representation of lexicographical data: https://www.wikidata.org/wiki/Wikidata:Lexicographical_data/Documentation (last accessed 30 December 2025).

[9] For more on this subject and other aspects of sourcing statements about lexemes, see Wikidata’s guide to lexicographical notability: https://www.wikidata.org/wiki/Wikidata:Lexicographical_data/Notability (last accessed 30 December 2025).

[10] “Serious”: www.wikidata.org/wiki/Wikidata:Lexicographical_data/Notability (last accessed 30 December 2025); “reliable”: https://www.wikidata.org/wiki/Help:Sources (last accessed 30 December 2025).

[11] https://es.wikisource.org/wiki/Cornijamento_(DAC) (last accessed 30 December 2025).

[12] This tendency was particularly egregious in the case of the Greek historian Diodorus of Sicily: on Diodorus and his victimization at the hands of source critics see e.g. Muntz, 2011.

[13] A French translation of Paul was made in the nineteenth century, which, however, did not translate the more detailed and useful text of Festus (Savagner, 1846), doubtless due to the damaged state of the text.

Competing Interests

The author has no competing interests to declare.

DOI: https://doi.org/10.5334/johd.455 | Journal eISSN: 2059-481X
Language: English
Submitted on: Nov 3, 2025
|
Accepted on: Dec 11, 2025
|
Published on: Jan 13, 2026
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2026 Stephen Blair, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.