Evaluation of Quality of Slovak Language Use in LLMS

Marek Dobeš

doi:10.2478/aei-2025-0004

.blurhash-client-img { display: none !important; }

Evaluation of Quality of Slovak Language Use in LLMS

Acta Electrotechnica et Informatica

Volume 25 (2025): Issue 1 (March 2025)

By: Marek Dobeš

Open Access

|Feb 2025

RADFORD, A. ‒ WU, J. ‒ CHILD, R. ‒ LUAN, D. ‒ AMODEI, D. ‒ SUTSKEVER, I. (2019). Language models are unsupervised multitask learners.
Search in Google Scholar Back to article
PAPINENI, K. ‒ ROUKOS, S. ‒ WARD, T. ‒ ZHU, W. J. (2002). BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics (pp. 311-318).
Search in Google Scholar Back to article
LIN, C. Y. (2004). ROUGE: A package for automatic evaluation of summaries. Text Summarization Branches Out, 74-81.
Search in Google Scholar Back to article
CHINCHOR, N. (1991). MUC-3 Evaluation Metrics and Linguistic Phenomena Tests. In: NATURAL LANGUAGE PROCESSING SYSTEMS EVALUATION WORKSHOP. p. 13.
Search in Google Scholar Back to article
ZHANG, T. ‒ KISHORE, V. ‒ WU, F. ‒ WEINBERGER, K. Q. ‒ ARTZI, Y. (2019). BERTScore: Evaluating text generation with BERT. arXiv preprint arXiv:1904.09675.
Search in Google Scholar Back to article
ZHAO, W. ‒ PEYRARD, M. ‒ LIU, F. ‒ GAO, Y. ‒ MEYER, C. M. ‒ EGER, S. (2019). MoverScore: Text generation evaluating with contextualized embeddings and Earth Mover Distance. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, 563-578.
Search in Google Scholar Back to article
BROWN, T. B. ‒ MANN, B. ‒ RYDER, N. ‒ SUBBIAH, M. ‒ KAPLAN, J. ‒ DHARIWAL, P. ‒ AMODEI, D. (2020). Language models are few-shot learners. arXiv preprint arXiv:2005.14165.
Search in Google Scholar Back to article
WEI, J. ‒ WANG, X. ‒ SCHUURMANS, D. ‒ BOSMA, M. ‒ ICHTER, B. ‒ XIA, F. ‒ LE, Q. (2022). Chain-of-thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903.
Search in Google Scholar Back to article
MAYNEZ, J. ‒ NARAYAN, S. ‒ BOHNET, B. ‒ MCDONALD, R. (2020). On faithfulness and factuality in abstractive summarization. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 1906-1919.
Search in Google Scholar Back to article
SHENG, E. ‒ CHANG, K. W. ‒ NATARAJAN, P. ‒ PENG, N. (2021). Societal biases in language generation: Progress and challenges. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics, 4275-4293.
Search in Google Scholar Back to article
GEHMAN, S. ‒ GURURANGAN, S. ‒ SAP, M. ‒ CHOI, Y. ‒ SMITH, N. A. (2020). RealToxicityPrompts: Evaluating neural toxic degeneration in language models. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 3356-3369.
Search in Google Scholar Back to article
POPOVIĆ, M. (2017). chrF++: words helping character n-grams. Proceedings of the Second Conference on Machine Translation, 612-618.
Search in Google Scholar Back to article
JOSHI, P. ‒ SANTY, S. ‒ BUDHIRAJA, A. ‒ BALI, K. ‒ CHOUDHURY, M. (2020). The state and fate of linguistic diversity and inclusion in the NLP world. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 6282-6293.
Search in Google Scholar Back to article
BEDNÁR, P. ‒ DOBEŠ, M. ‒ GARABÍK, R. (2024). Training of large language model Mistral on Slovak language data. Jazykovedný časopis. Under review.
Search in Google Scholar Back to article
VAN DER LEE, C. ‒ GATT, A. ‒ VAN MILTENBURG, E. ‒ WUBBEN, S. ‒ KRAHMER, E. (2019). Best practices for the human evaluation of automatically generated text. Proceedings of the 12th International Conference on Natural Language Generation, 355-368.
Search in Google Scholar Back to article
CHIANG, W.-L. ‒ LI, Z. ‒ LIN, Z. ‒ SHENG, Y. ‒ WU, Z. ‒ ZHANG, P. ‒ ZHANG, C. (2023). Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality. https://lmsys.org/blog/2023-03-30-vicuna/
Search in Google Scholar Back to article
KOCMI, T. ‒ FEDERMANN, C. (2023). Large language models are state-of-the-art evaluators of translation quality. arXiv preprint arXiv:2302.14520.
Search in Google Scholar Back to article
BENDER, E. M. ‒ GEBRU, T. ‒ MCMILLAN-MAJOR, A. ‒ SHMITCHELL, S. (2021). On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (pp. 610–623). ACM.
Search in Google Scholar Back to article