Abstract
Digital research infrastructures frequently describe themselves as multilingual, yet the degree to which languages are represented, documented, and made accessible depends on metadata policies and aggregation practices that may introduce unevenness. This paper presents the Linguistic Asymmetry Index (LAI) as an experimental benchmarking framework for examining how such infrastructures organise linguistic visibility. The LAI defines linguistic asymmetry as a structural property arising from metadata design choices, institutional configurations, and access conditions, and it operationalises this through five weighted components, all computed from openly available metadata using a reproducible workflow: Language Representation Asymmetry; English Anchor Bias; Metadata Completeness Disparity; Institutional Concentration Index; and Access Inequality Index.
The LAI is applied to four digital infrastructures (CLARIN, Europeana, EUDAT/B2FIND, and OpenAIRE) to benchmark distinct configurations of linguistic asymmetry rather than to rank performance. The results illustrate how different technical and organisational arrangements correspond to specific asymmetry regimes, including disciplinary, institutional, technocratic, and systemic patterns. These configurations suggest that linguistic imbalance is shaped by infrastructural logics rather than by explicit multilingual policies.
While acknowledging its current limitations, including its dependence on publicly available metadata and its snapshot character, the LAI provides a prototype diagnostic benchmark that supports empirical observation of linguistic equity across heterogeneous systems. The approach is proposed as a basis for longitudinal monitoring and comparative analysis, complementing qualitative research on metadata governance and FAIR-aligned practices.
