Have a personal or library account? Click to login
Beyond Accuracy: Cross-Linguistic Equity and Socio-Technical Dimensions of Large Language Models Cover

Beyond Accuracy: Cross-Linguistic Equity and Socio-Technical Dimensions of Large Language Models

Open Access
|Feb 2026

Abstract

Artificial intelligence (AI) and AI-based systems are rapidly gaining popularity across all areas of daily life. Among these systems, large language models (LLMs), which probabilistically model language to understand and generate text, stand out at the forefront. The ability to generate results from LLMs, whose primary focus is language, is of significant technical and social importance. As language diversity increases, the ability of LLMs to produce stable and consistent results is trending downwards. This decrease has a close relation with the size of the model, the scope of the training data, and the prompt technique used in response generation. To this end, a study was conducted to measure the success of LLMs in different languages. In the study, four LLMs were examined, three of which were open-source (DeepSeek-Coder-6.7B-Instruct, Qwen2.5-Coder-7B-Instruct, Llama-3.1-8B-Instruct) and one was closed-source (GPT-5). These models were evaluated using the HumanEval-XL dataset across seven natural languages that have different data sources and usage prevalences. Additionally, the effect of the human development index (HDI) values of the countries where the languages are spoken and the prompt technique used on the results was also analysed. Results show that as LLMs grow, performance differences between languages have decreased. Additionally, it has been observed that whether the models are open-source or closed-source also has a significant impact on performance. Among open-source LLMs, DeepSeek-Coder-6.7B-Instruct's accuracy rates range from 37 % to 60 %, while Qwen2.5-Coder-7B-Instruct and Llama-3.1-8B-Instruct have performed more consistently in the 95–99 % range. GPT-5, which is a closed-source LLM, has demonstrated balanced accuracy across all languages. The results obtained reveal remarkable results in ethics, quantity of linguistic data, and equality of access to technology. The results also clearly demonstrate the relationship between multilingual accuracy, language prevalence, and prompt techniques. In this way, the study offers a clearer and more comprehensive understanding of the issues surrounding linguistic justice and the generalization of LLMs in the field of AI.

DOI: https://doi.org/10.2478/acss-2026-0001 | Journal eISSN: 2255-8691 | Journal ISSN: 2255-8683
Language: English
Page range: 1 - 16
Submitted on: Nov 20, 2025
|
Accepted on: Feb 2, 2026
|
Published on: Feb 18, 2026
In partnership with: Paradigm Publishing Services
Publication frequency: Volume open

© 2026 Fidan Kaya Gülađýz, published by Riga Technical University
This work is licensed under the Creative Commons Attribution 4.0 License.

Volume 31 (2026): Issue 1 (January 2026)