Have a personal or library account? Click to login
Is it Possible to Re-Educate Roberta? Expert-Driven Machine Learning for Punctuation Correction Cover

Is it Possible to Re-Educate Roberta? Expert-Driven Machine Learning for Punctuation Correction

Open Access
|Dec 2023

References

  1. Benko, V. (2015). Araneum Bohemicum Maius, verze 15.04. Ústav Českého národního korpusu FF UK, Praha 2015. Accessible at: http://www.korpus.cz.
  2. Devlin, J. et al. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. ArXiv. /abs/1810.04805 Accessible at: https://doi.org/10.48550/ar-Xiv.1810.04805.
  3. Chordia, V. (2021). PunKtuator: A multilingual punctuation restoration system for spoken and written text. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations. pages 312–320. Association for Computational Linguistics. Accessible at: https://doi.org/10.18653/v1/2021.eacl-demos.37.
  4. Internet Language Reference Book. (2008–2023). Praha: Ústav pro jazyk český AV ČR. Accessible at: https://prirucka.ujc.cas.cz/.
  5. Karlík, P. (2017). Vokativ. In M. Nekula et al. (eds.): Nový encyklopedický slovník češtiny. Accessible at: https://www.czechency.org/slovnik/search?action=listpub&search=vokativ.
  6. Kovář, V. et al. (2016). Evaluation and improvements in punctuation detection for Czech. In P. Sojka et al. (eds.): Text, Speech, and Dialogue, pages 287–294. Springer International Publishing.
  7. Lehečka, J. et al. (2021). Comparison of Czech Transformers on Text Classification Tasks. In L. Espinosa-Anke et al. (eds): Statistical Language and Speech Processing. SLSP 2021. Lecture Notes in Computer Science, vol. 13062. Springer. Accessible at: https://doi.org/10.1007/978-3-030-89579-2_3.
  8. Liu, Y. et al. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach. ArXiv. /abs/1907.11692. Accessible at: https://doi.org/10.48550/arXiv.1907.11692.
  9. Machura, J. et al. (2022). Automatic Grammar Correction of Commas in Czech Written Texts: Comparative Study. In P. Sojka et al. (eds): Text, Speech, and Dialogue. TSD 2022. Lecture Notes in Computer Science, vol 13502. Springer. Accessible at: https://doi.org/10.1007/978-3-031-16270-1_10.
  10. Nunberg, G. (1990). The Linguistics of Punctuation. CSLI lecture notes. Cambridge University Press.
  11. Radford, A. et al. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8), 9.
  12. Suchomel, V. (2018). csTenTen17, a Recent Czech Web Corpus. In Twelveth Workshop on Recent Advances in Slavonic Natural Language Processing. Brno: Tribun EU, 2018, pages 111–123.
  13. Švec, J. et al. (2014) General framework for mining, processing and storing large amounts of electronic texts for language modelling purposes. Lang Resources & Evaluation 48, pages 227–248. Accessible at: https://doi.org/10.1007/s10579-013-9246-z.
  14. Švec, J. et al. (2021). Transformer-based automatic punctuation prediction and word casing reconstruction of the ASR output. In K. Ekštein et al. (eds.): Text, Speech, and Dialogue, Springer International Publishing, pages 86–94.
DOI: https://doi.org/10.2478/jazcas-2023-0052 | Journal eISSN: 1338-4287 | Journal ISSN: 0021-5597
Language: English
Page range: 357 - 368
Published on: Dec 25, 2023
Published by: Slovak Academy of Sciences, Mathematical Institute
In partnership with: Paradigm Publishing Services
Publication frequency: 2 issues per year

© 2023 Jakub Machura, Hana Žižková, Adam Frémund, Jan Švec, published by Slovak Academy of Sciences, Mathematical Institute
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.