The Linguistic Asymmetry Index (LAI): Benchmarking Equity in Multilingual Research Infrastructures

Elena Battaner; Paul Spence

doi:10.5334/johd.474

Abstract

Digital research infrastructures frequently describe themselves as multilingual, yet the degree to which languages are represented, documented, and made accessible depends on metadata policies and aggregation practices that may introduce unevenness. This paper presents the Linguistic Asymmetry Index (LAI) as an experimental benchmarking framework for examining how such infrastructures organise linguistic visibility. The LAI defines linguistic asymmetry as a structural property arising from metadata design choices, institutional configurations, and access conditions, and it operationalises this through five weighted components, all computed from openly available metadata using a reproducible workflow: Language Representation Asymmetry; English Anchor Bias; Metadata Completeness Disparity; Institutional Concentration Index; and Access Inequality Index.

The LAI is applied to four digital infrastructures (CLARIN, Europeana, EUDAT/B2FIND, and OpenAIRE) to benchmark distinct configurations of linguistic asymmetry rather than to rank performance. The results illustrate how different technical and organisational arrangements correspond to specific asymmetry regimes, including disciplinary, institutional, technocratic, and systemic patterns. These configurations suggest that linguistic imbalance is shaped by infrastructural logics rather than by explicit multilingual policies.

While acknowledging its current limitations, including its dependence on publicly available metadata and its snapshot character, the LAI provides a prototype diagnostic benchmark that supports empirical observation of linguistic equity across heterogeneous systems. The approach is proposed as a basis for longitudinal monitoring and comparative analysis, complementing qualitative research on metadata governance and FAIR-aligned practices.

References

Akindotuni, D. (2025). Resource asymmetry in multilingual NLP: A comprehensive review and critique. Journal of Computer and Communications, 13(7), 14–47. 10.4236/jcc.2025.137002
Open DOI Search in Google Scholar Back to article
Battaner, E., & Spence, P. (2025). Linguistic Asymmetry Index (LAI): Benchmarking Multilingual Research Infrastructures. Reproducible Workflow (v1.0). Zenodo. 10.5281/zenodo.17597231
Open DOI Search in Google Scholar Back to article
Battaner, E., & Spence, P. (in preparation). Linguistic asymmetry in digital infrastructures.
Search in Google Scholar Back to article
Beer, D. (2016). Metric power. Palgrave Macmillan. 10.1057/978-1-137-55649-3
Open DOI Search in Google Scholar Back to article
Bowker, G. C., & Star, S. L. (1999). Sorting things out: Classification and its consequences. MIT Press. 10.7551/mitpress/6352.001.0001
Open DOI Search in Google Scholar Back to article
Candela, G., Escobar, P., Carrasco, R. C., & Marco-Such, M. (2020). Evaluating the quality of linked open data in digital libraries. Journal of Information Science, 48(1), 21–43. 10.1177/0165551520930951
Open DOI Search in Google Scholar Back to article
CLARIN ERIC (2012). About CLARIN https://www.clarin.eu/content/about-clarin (last accessed: 2026-01-29).
Search in Google Scholar Back to article
COAR Task Force on Supporting Multilingualism and non-English Content in Repositories (2023). Good Practice Advice for Managing Multilingual and non-English Language Content in Repositories. COAR. Zenodo. 10.5281/zenodo.10053918
Open DOI Search in Google Scholar Back to article
di Buono, M., Oliveira, H., Mititelu, V., Spahiu, B., & Nolano, G. (2022). Paving the way for enriched metadata of linguistic linked data. Semantic Web: – Interoperability, Usability, Applicability, 13(6), 1133–1157. 10.3233/SW-222994
Open DOI Search in Google Scholar Back to article
Dony, C., Kuchma, I., & Ševkušić, M. (2024). Dealing with Multilingualism and Non-English Content in Open Repositories: Challenges and Perspectives. The Journal of Electronic Publishing, 27(1). 10.3998/jep.5455
Open DOI Search in Google Scholar Back to article
Dotson, K. (2014). Conceptualizing epistemic oppression. Social Epistemology, 28(2), 115–138. 10.1080/02691728.2013.782585
Open DOI Search in Google Scholar Back to article
Espeland, W. N., & Stevens, M. (2008). A sociology of quantification. European Journal of Sociology, 49(3), 401–436. 10.1017/S0003975609000150
Open DOI Search in Google Scholar Back to article
EUDAT Consortium (2011). B2FIND metadata service documentation. https://eudat.eu/services/b2find (last accessed: 2026-01-29).
Search in Google Scholar Back to article
Europeana Foundation (2008). Europeana DSI-4 multilingual strategy. https://pro.europeana.eu/post/europeana-dsi-4-multilingual-strategy (last accessed: 2026-01-29).
Search in Google Scholar Back to article
Fricker, M. (2007). Epistemic injustice: Power and the ethics of knowing. Oxford University Press. 10.1093/acprof:oso/9780198237907.001.0001
Open DOI Search in Google Scholar Back to article
Gaspari, F., Grützner-Zahn, A., Rehm, G., Gallagher, O., Giagkou, M., Piperidis, S., & Way, A. (2023). Digital Language Equality: Definition, Metric, Dashboard. In G. Rehm & A. Way (Eds.) European Language Equality: A Strategic Agenda for Digital Language Equality (pp. 39–73). Cognitive Technologies. Springer. 10.1007/978-3-031-28819-7_3
Open DOI Search in Google Scholar Back to article
OpenAIRE AMKE (2018). OpenAIRE Research Graph documentation. https://graph.openaire.eu (last accessed: 2026-01-29).
Search in Google Scholar Back to article
Rehm, G., & Uszkoreit, H. (2012). META-NET white paper series: Europe’s Languages in the Digital Age. Springer. https://european-language-equality.eu/meta-net-white-paper-series/ (last accessed: 2026-01-29).
Search in Google Scholar Back to article
Spence, P. (2021). Disrupting digital monolingualism: A report on multilingualism in digital theory and practice (1.0). Language Acts & Worldmaking Project. Zenodo. 10.5281/zenodo.5743283
Open DOI Search in Google Scholar Back to article
Stiller, J., & Király, P. (2017). Multilinguality of metadata. Measuring the multilingual degree of Europeana’s metadata. In M. Gäde, V. Trkulja, & V. Petras (Eds.), Everything Changes, Everything Stays the Same? Understanding Information Spaces. Proceedings of the 15th International Symposium of Information Science (ISI 2017), Schriften zur Informationswissenschaft (pp. 164–176). Verlag Werner Hülsbusch. https://edoc.hu-berlin.de/server/api/core/bitstreams/18b49b8b-2517-4458-b937-96130bd7c6ac/content (last accessed: 2026-01-29).
Search in Google Scholar Back to article
Trippel, T. (2025). Metadata for Research Data. In P. Bański, U. Heid, & L. Herzberg (Eds.), Harmonising Language Data: Standards for Linguistic Resources (pp. 251–279). De Gruyter. 10.1515/9783112208212
Open DOI Search in Google Scholar Back to article
Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, S. T., Finkers, R. … Mons, B. (2016). The FAIR guiding principles for scientific data management and stewardship. Scientific Data, 3(160018). 10.1038/sdata.2016.18
Open DOI Search in Google Scholar Back to article

The Linguistic Asymmetry Index (LAI): Benchmarking Equity in Multilingual Research Infrastructures

Abstract

Paradigm

My account