
Extending CLDF — Towards a Type System for Cross-Linguistic Data
Abstract
We argue that, in order to maximize reusability of cross-linguistic data, it is useful to think about it in terms of a type system. Type systems are enforceable rules guiding the interpretation of data in computer programs. Thus, they link data values to valid operations which can be performed on them. Clearly, the reusability of research data is determined largely by the availability of suitable analysis methods. A clear idea of cross-linguistic data types will enable development of analysis methods as well as a mechanism to match valid data with appropriate operations. The Cross-Linguistic Data Formats (CLDF) initiative provides a toolkit to model such cross-linguistic data types, and in recent years we have seen a paradigm (and an associated process) arise of how new types can be added to CLDF through stepwise conventionalization. Additionally, data types provide a useful selection criterion to group datasets for unified curation. Thus, a type system for cross-linguistic data will provide actionable metadata to guide data curation and inform data reuse.
© 2026 Robert Forkel, Johann-Mattis List, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.