Skip to main content
Have a personal or library account? Click to login
Extending CLDF — Towards a Type System for Cross-Linguistic Data Cover

Extending CLDF — Towards a Type System for Cross-Linguistic Data

Open Access
|Apr 2026

References

  1. Blust, R., & Trussel, S. (2013). The Austronesian comparative dictionary: A work in progress. Oceanic Linguistics, 52(2), 493523. 10.1353/ol.2013.0016
  2. Cardelli, L. (2004). Type systems. In A. B. Tucker (Ed.), CRC handbook of computer science and engineering (2nd ed.). CRC Press.
  3. Cardelli, L., & Wegner, P. (1985). On understanding types, data abstraction, and polymorphism. ACM Computing Surveys (CSUR), 17(4), 471522. 10.1145/6041.6042
  4. Comrie, B., Haspelmath, M., & Bickel, B. (2015). Leipzig glossing rules. Conventions for interlinear morpheme-by-morpheme glosses. Max Planck Institute for Evolutionary Anthropology. https://www.eva.mpg.de/lingua/pdf/Glossing-Rules.pdf
  5. Dunn, M. (2015). Language phylogenies. In C. Bowern & B. Evans (Eds.), The Routledge handbook of historical linguistics (pp. 190211). Routledge. 10.1093/oso/9780195066074.003.0005
  6. Durie, M. (1996). Early Germanic Umlaut and variable rules. In M. Durie (Ed.), The comparative method reviewed: Regularity and irregularity in language change (pp. 112134). Oxford University Press.
  7. Forkel, R. (2023). Evolving CLDF: Why and how?. 10.5281/ZENODO.10887671
  8. Forkel, R., & Greenhill, S. (2023). Phlorest – Seeing the forest and not just trees. 10.5281/ZENODO.10684787
  9. Forkel, R., & Hammarström, H. (2022). Glottocodes: Identifiers linking families, languages and dialects to comprehensive book information. Semantic Web, 13(6), 917924. 10.3233/SW-212843
  10. Forkel, R., & List, J. M. (2020). CLDFBench: Give your cross-linguistic data a lift. In Proceedings of the twelfth international conference on language resources and evaluation (pp. 69977004). European Language Resources Association (ELRA). https://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.864.pdf
  11. Forkel, R., List, J.-M., Greenhill, S. J., Rzymski, C., Bank, S., Cysouw, M., Hammarström, H., Haspelmath, M., Kaiping, G. A., & Gray, R. D. (2018). Cross-linguistic data Formats, advancing data sharing and re-use in comparative linguistics. Scientific Data, 5(180205). 10.1038/sdata.2018.205
  12. Gauchat, L., Jeanjaquet, J., & Tappolet, E. (1925). Tableaux phonétiques des patois suisses romands. Attinger.
  13. Geisler, H., Forkel, R., & List, J. M. (2021). A digital, retro-standardized edition of the tableaux phonétiques des patois suisses romands (TPPSR). In M. Avanzi, N. Lo Vecchio, A. Millour, & A. Thibault (Eds.), Nouveaux regards sur la variation dialectale (pp. 1336). Éditions de Linguistique et de Philologie. https://tppsr.clld.org
  14. Haspelmath, M., Dryer, M. S., Gil, D., & Comrie, B. (Eds.). (2005). The world atlas of language structures. Oxford University Press.
  15. Haynie, H. J., Skirgård, H., Blasi, D. E., Hammarström, H., Collins, J., Latarche, J. J., … & Gray, R. D. (2023). Grambank reveals the importance of genealogical constraints on linguistic diversity and highlights the impact of language loss. Science Advances, 9(16). 10.1126/sciadv.adg6175
  16. Hudson, P., & Ishizu, M. (2017). History by numbers: An introduction to quantitative approaches (2nd ed.). Bloomsbury.
  17. Jäger, G. (2018). Global-scale phylogenetic linguistic inference from lexical resources. Nature Scientific Data, 5(180189), 116. 10.1038/sdata.2018.189
  18. Kaufman, T., & Justeson, J. (2003). A preliminary Mayan etymological dictionary. Foundation for the Advancement of Mesoamerican Studies. https://www.famsi.org/reports/01051/index.html
  19. Kortmann, B. (2021). Reflecting on the quantitative turn in linguistics. Linguistics, 59(5), 12071226. doi: 10.1515/ling-2019-0046
  20. Lehmann, C. (2004). Interlinear morphemic glossing.” In G. E. Booij, C. Lehmann, J. Mugdan, & S. Skopeteas (Eds.), Morphology. An international handbook (2, pp. 18341857). De Gruyter. 10.1515/9783110172782.2.20.1834
  21. List, J. M. (2014). Sequence comparison in historical linguistics. Düsseldorf University Press. 10.1515/9783110720082
  22. List, J. M., & Forkel, R. (2023). LingPy: A Python library for quantitative tasks in historical linguistics [Software, Version 2.6.13]. MCL Chair at the University of Passau. https://pypi.org/project/lingpy
  23. List, J. M., Hill, N. W., & Forkel, R. (2022). A new framework for fast automated phonological reconstruction using trimmed alignments and sound correspondence patterns.” In Proceedings of the 3rd workshop on computational approaches to historical language change (pp. 8996). Association for Computational Linguistics. https://aclanthology.org/2022.lchange-1.9
  24. Meillet, A. (1925/1954). La méthode comparative en linguistique historique. Reprint. Honoré Champion.
  25. Moran, S., & McCloy, D. (Eds.). (2019). PHOIBLE 2.0. Max Planck Institute for the Science of Human History. https://phoible.org/
  26. Nordhoff, S., & Krämer, K. (2022). IMTVault: Extracting and enriching low-resource language interlinear glossed text from grammatical descriptions and typological survey articles. In Proceedings of the 8th workshop on linked data in linguistics within the 13th language resources and evaluation conference (pp. 1725). European Language Resources Association. https://aclanthology.org/2022.ldl-1.3/
  27. Nordhoff, S., & Krämer, K. (2025). Creating and enriching a repository of 177k interlinearized examples in 1611 mostly lesser-resourced languages. In Proceedings of the 5th conference on language, data and knowledge (pp. 186196). Unior Press. https://aclanthology.org/2025.ldk-1.20/
  28. Nordhoff, S., Seyfeddinipur, M., & Döhler, C. (2024). Mobilizing archival collections: The open text collections project. Language Documentation and Archiving Conference.
  29. Pallas, P. S. (1787/1789). Linguarum totius orbis vocabularia comparativa; Augustissimae Cura Collecta. (Vol. 2). Typis Iohannis Georgii Schnoor.
  30. Pallas, P. S. (1789). Sravnitel’nye Slovari Vsech Jazykov i Narečij, Sobrannye Desniceju Vsevysočajšeij Osoby. Otdelenie Pervoe, Soderžaščee v Sebe Evropejskie i Aziatskie Jazyki. (Vol. 2). Šnor.
  31. Parnas, D. L., Shore, J. E., & Weiss, D. (1976). Abstract types defined as classes of variables. In Proceedings of the 1976 conference on data: Abstraction, definition and structure (pp. 149154). Association for Computing Machinery. 10.1145/800237.807133
  32. Ranacher, P., Forkel, R., Efrat-Kowalsky, N., Urban, M., Hehli, A., Franz, M., … & Norder, S. (2025). A global and interoperable dataset of linguistic distributions derived from the atlas of the world’s languages. Scientific Data, 12(1). 10.1038/s41597-025-05828-6
  33. Ranacher, P., Forkel, R., Efrat-Kowalksy, N., Urban, M., Hehli, A., Franz, M., Biland, G., Kreienbühl, A., Hermida Rodríguez, A., Ezevedo, M. C. B. C., Giebler, J., Takahashi, T., Neureiter, N., van Gijn, R., Roose, M., Vesakoski, O., Weibel, R., Kaiping, G., & Norder, S. (2026). Glottography: an open-source geolinguistic data platform for mapping the world’s languages. Journal of Open Humanities Data, 12(47). 116. 10.5334/johd.459
  34. Ross, M., Pawley, A., & Osmond, M. (1998). The lexicon of proto oceanic: The culture and environment of Ancestral Oceanic Society 1: Material culture (Vol. C–152). Pacific Linguistics. https://epress.anu.edu.au/lexicon_citation.html
  35. Smith, A. D., Forkel, R., & Blumenfeld, L. (2025). The Austronesian and the Micronesian comparative dictionaries as CLDF datasets. Scientific Data, 12(1). 10.1038/s41597-025-05301-4
  36. Tennison, J. (2016). CSV on the web: A primer. W3C. https://www.w3.org/TR/tabular-data-primer/
  37. The Glottography Consortium (2025). Glottography dataset derived from Walker and Ribeiro 2011 “Bayesian phylogeography of the Arawak expansion in Lowland South America”. Zenodo. 10.5281/ZENODO.17342060
  38. Walker, R. S., & Ribeiro, L. A. (2011). Bayesian phylogeography of the Arawak expansion in Lowland South America. Proceedings of the royal society B: Biological sciences, 278(1718), 25622567. 10.1098/rspb.2010.2579
  39. Walker, R. S., & Ribeiro, L. A. (2024). CLDF Dataset Derived from Walker and Ribeiro’s “Bayesian Phylogeography of the Arawak Expansion” from 2011. Zenodo. 10.1098/rspb.2010.2579
  40. Walker, R. S., & Ribeiro, L. A. (2025). Phlorest phylogeny derived from Walker & Ribeiro 2011 ‘Bayesian Phylogeography of the Arawak expansion in Lowland South America’. Zenodo. 10.1098/rspb.2010.2579
  41. Weirich, S. (2014). Type Systems. In T. Gonzalez & J. Díaz-Herrera (Eds.), Computing Handbook. Computer Science and Software Engineering (3rd ed.) (pp. 136). CRC Press. 10.1201/b16812-79
DOI: https://doi.org/10.5334/johd.517 | Journal eISSN: 2059-481X
Language: English
Page range: 62 - 62
Submitted on: Jan 30, 2026
Accepted on: Mar 26, 2026
Published on: Apr 29, 2026
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2026 Robert Forkel, Johann-Mattis List, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.