Have a personal or library account? Click to login
Tailored Fine-tuning for Comma Insertion in Czech Cover

Abstract

Transfer learning techniques, particularly the use of pre-trained Transformers, can be trained on vast amounts of text in a particular language and can be tailored to specific grammar correction tasks, such as automatic punctuation correction. The Czech pre-trained RoBERTa model demonstrates outstanding performance in this task (Machura et al. 2022); however, previous attempts to improve the model have so far led to a slight degradation (Machura et al. 2023). In this paper, we present a more targeted fine-tuning of this model, addressing linguistic phenomena that the base model overlooked. Additionally, we provide a comparison with other models trained on a more diverse dataset beyond just web texts.

DOI: https://doi.org/10.2478/jazcas-2025-0024 | Journal eISSN: 1338-4287 | Journal ISSN: 0021-5597
Language: English
Page range: 268 - 278
Published on: Nov 27, 2025
Published by: Slovak Academy of Sciences, Mathematical Institute
In partnership with: Paradigm Publishing Services
Publication frequency: 2 issues per year

© 2025 Jakub Machura, Hana Žižková, Patrik Stano, Tereza Vrabcová, Dana Hlaváčková, Ondřej Trnovec, published by Slovak Academy of Sciences, Mathematical Institute
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.