Wikidata and LiLa for Latin: Enabling Interoperability and Access to Inflected Forms and Corpus Attestations

David Lindemann; Matteo Pellegrini; Francesco Mambrini; Marco Passarotti

doi:10.5334/johd.464

Abstract

This paper presents an approach to integrating Latin inflected forms and corpus attestations within a Linked Open Data (LOD) framework, enhancing interoperability between Wikidata and the LiLa knowledge base. Building on the PrinParLat lexicon of Latin verb principal parts, we generate the complete set of inflected forms for over 8,000 verbs, encoded as RDF in a dedicated Wikibase instance. These forms are linked to the Index Thomisticus Treebank (ITTB), whose morphologically annotated tokens are related to corresponding forms based on segmental identity, lemma alignment, and mapped morphological features. Our generation and linking process achieves over 95% coverage of ITTB verbal tokens, demonstrating the robustness of our pipeline even for Medieval Latin data. By aligning Paralex, Wikidata, and LiLa ontologies, we ensure semantic interoperability and facilitate future integration into Wikidata. Beyond Latin, this workflow provides a reproducible model for linking inflectional paradigms and corpus attestations in other languages.

References

Aronoff, M. (1993). Morphology by Itself: Stems and Inflectional Classes. MIT Press.
Search in Google Scholar Back to article
Batsuren, K., et al. (2022). UniMorph 4.0: Universal Morphology. In Proceedings of the Thirteenth Language Resources and Evaluation Conference (pp. 840–855). Marseille, France: European Language Resources Association. URL: https://aclanthology.org/2022.lrec-1.89/. Last accessed 2025/11/15.
Search in Google Scholar Back to article
Beniamine, S. (2018). Classifications flexionnelles. Étude quantitative des structures de paradigms. Doctoral dissertation, Université Sorbonne Paris Cité-Université Paris Diderot (Paris 7).
Search in Google Scholar Back to article
Beniamine, S., Anderson, C., Carroll, M., Guzmán Naranjo, M., Herce, B., Pellegrini, M., Round, E., Sims-Williams, H., & Tresoldi, T. (2023). Paralex: a DeAR standard for rich lexicons of inflected forms. In The Fourth International Symposium of Morphology.
Search in Google Scholar Back to article
Bonami, O., & Beniamine, S. (2016). Joint predictiveness in inflectional paradigms. Word structure, 9(2), 156–182. 10.3366/word.2016.0092
Open DOI Search in Google Scholar Back to article
Boyé, G., & Schalchli, G. (2019). Realistic data and paradigms: The paradigm cell finding problem. Morphology, 29(2), 199–248. 10.1007/s11525-018-9335-1
Open DOI Search in Google Scholar Back to article
Cimiano, P., Chiarcos, C., McCrae, J. P., & Gracia, J. (2020). Linguistic linked data. Springer International Publishing. 10.1007/978-3-030-30225-2
Open DOI Search in Google Scholar Back to article
Cotterell, R., Kirov, C., Sylak-Glassman, J., Yarowsky, D., Eisner, J., & Hulden, M. (2016). The SIGMORPHON 2016 shared task—morphological reinflection. In Proceedings of the 14th SIGMORPHON workshop on computational research in phonetics, phonology, and morphology (pp. 10–22). 10.18653/v1/W16-2002
Open DOI Search in Google Scholar Back to article
De Felice, I., Tamponi, L., Iurescia, F., & Passarotti, M. (2023). Linking the Corpus CLaSSES to the LiLa Knowledge Base of Interoperable Linguistic Resources for Latin. In Proceedings of the Ninth Italian Conference on Computational Linguistics (CLiC-it 2023) (pp. 172–178). Venice, Italy: CEUR Workshop Proceedings. URL: https://aclanthology.org/2023.clicit-1.22/. Last accessed 2025/11/15.
Search in Google Scholar Back to article
De Paoli, A., Passarotti, M. C., Ruffolo, P., Moretti, G., & Kernerman, I. (2025). Linking the Lexicala Latin-French Dictionary to the LiLa Knowledge Base. In Proceedings of the 5th Conference on Language, Data and Knowledge (pp. 197–207). URL: https://aclanthology.org/2025.ldk-1.21/. Last accessed 2025/11/15.
Search in Google Scholar Back to article
Dezotti, L. C., Passarotti, M., & Mambrini, F. (2024). Modelling and Linking an Old Latin-Portuguese Dictionary to the LiLa Knowledge Base. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp. 11537–11547). URL: https://aclanthology.org/2024.lrec-main.1008/. Last accessed 2025/11/15.
Search in Google Scholar Back to article
Dressler, W. U., Kilani-Schoch, M., Gagarina, N., Pestal, L., & Pöchtrager, M. (2008), On the Typology of Inflection Class Systems. Folia Linguistica, 40(1–2), 51–74. 10.1515/flin.40.1-2.51
Open DOI Search in Google Scholar Back to article
Erxleben, F., Günther, M., Krötzsch, M., Mendez, J., & Vrandečić, D. (2014). Introducing wikidata to the linked data web. In International semantic web conference (pp. 50–65). Cham: Springer International Publishing. 10.1007/978-3-319-11964-9_4
Open DOI Search in Google Scholar Back to article
Fantoli, M., Passarotti, M., Mambrini, F., Moretti, G., & Ruffolo, P. (2022). Linking the LASLA Corpus in the LiLa Knowledge Base of Interoperable Linguistic Resources for Latin. In Proceedings of the Linked Data in Linguistics Workshop@ LREC2022 (pp. 26–34). URL: https://aclanthology.org/2022.ldl-1.4/. Last accessed 2025/11/15.
Search in Google Scholar Back to article
Fradin, B., & Kerleroux, F. (2003). Troubles with lexemes. In G. Booij, J. DeCesaris, A. Ralli, & S. Scalise (Eds.), Selected papers from the third Mediterranean Morphology Meeting (pp. 177–196). IULA – Universitat Pompeu Fabra.
Search in Google Scholar Back to article
Herce, B. (2025). VeLeSpa: An inflected verbal lexicon of Peninsular Spanish and a quantitative analysis of paradigmatic predictability. Language Resources and Evaluation, 59(2), 1705–1718. 10.1007/s10579-024-09776-2
Open DOI Search in Google Scholar Back to article
Kirov, C., Sylak-Glassman, J., Que, R., & Yarowsky, D. (2016). Very-large scale parsing and normalization of Wiktionary morphological paradigms. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16) (pp. 3121–3126). URL: https://aclanthology.org/L16-1498/. Last accessed 2025/11/15.
Search in Google Scholar Back to article
Lassila, O., & Swick, R. R. (1998) Resource Description Framework (RDF) Model and Syntax Specification. URL: https://www.w3.org/TR/1999/REC-rdf-syntax-19990222/. Last accessed 2025/11/15.
Search in Google Scholar Back to article
Lindemann, D. (2025). Ontolex-Lemon in Wikidata and other Wikibase instances. In Proceedings of the 5th Conference on Language, Data and Knowledge: The 5th OntoLex Workshop (pp. 35–45). 10.5281/zenodo.15471514
Open DOI Search in Google Scholar Back to article
Lindemann, D., Ahmadi, S., Khan, A. F., Mambrini, F., Iurescia, F., & Passarotti, M. C. (2023). When OntoLex Meets Wikibase: Remodeling Use Cases. CEUR Workshop Proceedings, 2773. URL: https://ceur-ws.org/Vol-3640/paper14.pdf. Last accessed 2025/11/15.
Search in Google Scholar Back to article
Mambrini, F., Litta, E., Passarotti, M., & Ruffolo, P. (2021a). Linking the Lewis & short dictionary to the LiLa knowledge base of interoperable linguistic resources for Latin. In Proceedings of the eighth Italian conference on computational linguistics (CLiC-it 2021) (pp. 216–222). URL: https://aclanthology.org/2021.clicit-1.34/. Last accessed 2025/11/15.
Search in Google Scholar Back to article
Mambrini, F., Passarotti, M., Litta, E., & Moretti, G. (2021b). Interlinking Valency Frames and Wordnet Synsets in the LiLa Knowledge Base of Linguistic Resources for Latin. In Further with Knowledge Graphs (pp. 16–28). IOS Press. 10.3233/SSW210032
Open DOI Search in Google Scholar Back to article
Mambrini, F., Passarotti, M., Moretti, G., & Pellegrini, M. (2022, June). The Index Thomisticus Treebank as Linked Data in the LiLa Knowledge Base. In Proceedings of the Thirteenth Language Resources and Evaluation Conference (pp. 4022–4029). URL: https://aclanthology.org/2022.lrec-1.428/. Last accessed 2025/11/15.
Search in Google Scholar Back to article
McCrae, J. P., Bosque-Gil, J., Gracia, J., Buitelaar, P., & Cimiano, P. (2017). The Ontolex-Lemon model: development and applications. In Proceedings of eLex 2017 conference (pp. 19–21).
Search in Google Scholar Back to article
Nicolai, G., Chodroff, E., Mailhot, F., & Çöltekin, Ç. (2024). Proceedings of the 21st SIGMORPHON workshop on Computational Research in Phonetics, Phonology, and Morphology. Mexico City, Mexico: Association for Computational Linguistics. URL: https://aclanthology.org/2024.sigmorphon-1/. Last accessed 2025/11/15.
Search in Google Scholar Back to article
Passarotti, M. (2019). The Project of the Index Thomisticus Treebank. In M. Berti (Ed.), Digital Classical Philology. Ancient Greek and Latin in the Digital Revolution (pp. 299–319). De Gruyter. 2019. 10.1515/9783110599572-017
Open DOI Search in Google Scholar Back to article
Passarotti, M., Mambrini, F., Franzini, G., Cecchini, F. M., Litta, E., Moretti, G., Ruffolo, P., & Sprugnoli, R. (2020). Interlinking through Lemmas. The Lexical Collection of the LiLa Knowledge Base of Linguistic Resources for Latin. Studi e Saggi Linguistici, 58(1), 177–212. 10.4454/ssl.v58i1.277
Open DOI Search in Google Scholar Back to article
Pedonese, G., Cecchini, F. M., & Passarotti, M. C. (2023). Linking the Computational Historical Semantics Corpus to the LiLa Knowledge Base of Interoperable Linguistic Resources for Latin. In Proceedings of the 4th conference on language, data and knowledge (pp. 74–85). URL: http://www.lrec-conf.org/proceedings/lrec2012/pdf/274_Paper.pdf. Last accessed 2025/11/15.
Search in Google Scholar Back to article
Pellegrini, M. (2023). Paradigm Structure and Predictability in Latin Inflection. An Entropy-based Approach. Springer. 10.1007/978-3-031-24844-3
Open DOI Search in Google Scholar Back to article
Pellegrini, M., & Passarotti, M. (2018). LatInfLexi: an Inflected Lexicon of Latin Verbs. In Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018) (pp. 325–330), Turin: CEUR Workshop Proceedings. URL: https://aclanthology.org/2018.clicit-1.57/. Last accessed 2025/11/15.
Search in Google Scholar Back to article
Pellegrini, M., Passarotti, M., Litta, E., Mambrini, F., Moretti, G., Corbetta, C., & Verdelli, M. (2022). Enhancing Derivational Information on Latin Lemmas in the LiLa Knowledge Base. A Structural and Diachronic Extension. Prague Bulletin of Mathematical Linguistics, 119, 67–92. 10.14712/00326585.023
Open DOI Search in Google Scholar Back to article
Pellegrini, M., Passarotti, M., Mambrini, F., & Moretti, G. (2025). PrinParLat: a lexicon of principal parts of Latin verbs linked to the LiLa Knowledge Base. Language Resources and Evaluation. 10.1007/s10579-025-09847-y
Open DOI Search in Google Scholar Back to article
Petrov, S., Das, D., & McDonald, R. (2012). A Universal Part-of-Speech Tagset. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12), (pp. 2089–2096), Istanbul, Turkey: European Language Resources Association (ELRA). URL: http://www.lrec-conf.org/proceedings/lrec2012/pdf/274_Paper.pdf. Last accessed 2025/11/15.
Search in Google Scholar Back to article
Sanderson, R., Ciccarese, P., & Van de Sompel, H. (2013). Designing the W3C open annotation data model. In Proceedings of the 5th Annual ACM Web Science Conference (pp. 366–375). 10.1145/2464464.2464474
Open DOI Search in Google Scholar Back to article
Sprugnoli, R., Passarotti, M., Testori, M., & Moretti, G. (2021). Extending and Using a Sentiment Lexicon for Latin in a Linked Data Framework. In Workshop on Sentiment Analysis and Linguistic Linked Data (SALLD-1) (pp. 1–14). 10.5281/zenodo.6303163
Open DOI Search in Google Scholar Back to article
Stump, G. T., & Finkel, R. A. (2013). Morphological typology: From word to paradigm. Cambridge University Press. 10.1017/CBO9781139248860
Open DOI Search in Google Scholar Back to article
Vrandečić, D., & Krötzsch, M. (2014). Wikidata: a free collaborative knowledge base. Communications of the ACM, 57(10), 78–85. 10.1145/2629489
Open DOI Search in Google Scholar Back to article