Application of NLP Technologies to Low-Resource Croatian Dialects
Abstract
In natural language processing (NLP) systems, a trend of decreased performance is observed when applied to texts written in low-resource dialects rather than the standard language. Dependency parsing is an essential component in NLP systems, and therefore, its improvement could lead to enhanced overall system performance. This paper aims to compare the performance of Slovenian and Croatian parsers for dependency parsing of the Kajkavian dialect. The comparison results will provide insight into the Slovenian parser’s potential for parsing Kajkavian. A dependency parsing dataset was created using parallel translations of the book „Mali kraljević“. Based on the created dataset, label projection from the parsed standard Croatian language to the Kajkavian dialect was performed to obtain data for calculating UAS and LAS metrics for comparing the Croatian and Slovenian parsers, which were implemented using the open-source SpaCy library. The Croatian parser achieved UAS and LAS scores of 0.47 and 0.30, respectively, which are lower than those of the Slovenian parser (0.52 and 0.34, respectively). The results indicate that the Slovenian parser performs more accurately on the Kajkavian dialect. However, to draw a general conclusion, the dataset would need to be expanded.
© 2026 Maja Polanec, Marina Bagić Babac, published by Međimurje University of Applied Sciences in Čakovec
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.