Have a personal or library account? Click to login
Tailored Fine-tuning for Comma Insertion in Czech Cover

Abstract

Transfer learning techniques, particularly the use of pre-trained Transformers, can be trained on vast amounts of text in a particular language and can be tailored to specific grammar correction tasks, such as automatic punctuation correction. The Czech pre-trained RoBERTa model demonstrates outstanding performance in this task (Machura et al. 2022); however, previous attempts to improve the model have so far led to a slight degradation (Machura et al. 2023). In this paper, we present a more targeted fine-tuning of this model, addressing linguistic phenomena that the base model overlooked. Additionally, we provide a comparison with other models trained on a more diverse dataset beyond just web texts.

DOI: https://doi.org/10.2478/jazcas-2025-0024 | Journal eISSN: 1338-4287 | Journal ISSN: 0021-5597
Language: English
Page range: 268 - 278
Published on: Nov 27, 2025
In partnership with: Paradigm Publishing Services
Publication frequency: 2 issues per year

© 2025 Jakub Machura, Hana Žižková, Patrik Stano, Tereza Vrabcová, Dana Hlaváčková, Ondřej Trnovec, published by Slovak Academy of Sciences, Ľudovít Štúr Institute of Linguistics
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.