Development of a Database and Models for Children’s Speech in the Slovak Language for Speech-oriented Applications

Ján Staš; Stanislav Ondáš; Matúš Pleva; Matej Horváth; Richard Ševc; Patrik Michalanský

doi:10.2478/jazcas-2025-0020

.blurhash-client-img { display: none !important; }

Development of a Database and Models for Children’s Speech in the Slovak Language for Speech-oriented Applications

Journal of Linguistics/Jazykovedný casopis

Volume 76 (2025): Issue 1 (June 2025)

By: Ján Staš , Stanislav Ondáš , Matúš Pleva , Matej Horváth, Richard Ševc and Patrik Michalanský

Open Access

|Nov 2025

Babu, A., Wang, Ch., Tjandra, A., Lakhotia, K., Xu, Q., Goyal, N., Singh, K., von Platen, P., Saraf, Y., Pino, J., Baevski, A., Conneau, A., and Auli, M. (2022). XLS-R: Self-supervised cross-lingual speech representation learning at scale. In Proc. of INTERSPEECH 2022, Incheon, Korea, pp. 2278–2282.
Search in Google Scholar Back to article
Baevski, A., Zhou, H., Mohamed, A., and Auli, M. (2020). wav2vec 2.0: A framework for self-supervised learning of speech representations. In Proc. of NISP 2020, Vancouver BC, Canada, pp. 12449–12460.
Search in Google Scholar Back to article
Barras, C., Geoffrois, E., Wu, Z., and Liberman, M. (2001). Transcriber: Development and use of a tool for assisting speech corpora production. Speech Communications, Special Issue on Speech Annotation and Corpus Tools, 33(1–2), pp. 5–22.
Search in Google Scholar Back to article
Batliner, A., Blomberg, M., D’Arcy, Sh., Elenius, D., Giuliani, D., Gerosa, M., Hacker, Ch., Russell, M., Steidl, S., and Wong, M. (2005). The PF_STAR children’s speech corpus. In Proc. of INTERSPEECH 2005, Lisbon, Portugal.
Search in Google Scholar Back to article
Bhardwaj, V., Othman, M.T.B., Kukreja, V., Belkhier, Y., Bajaj, M., Goud, B. S., Rehman, A. U., Shafiq, M., and Hamam, H. (2022). Automatic speech recognition (ASR) systems for children: A systematic literature review. Applied Sciences, 12(9), paper 4419.
Search in Google Scholar Back to article
Claus, F., Rosales, H. G., Petrick, R., Hain, H.-U., and Hoffmann, R. (2013). A survey about databases of children’s speech. In Proc. of INTERSPEECH 2013, Lyon, France.
Search in Google Scholar Back to article
Eskenazi, M., Mostow, J., and Graff, D. (1997). The CMU kids corpus. LDC97S63. Philadelphia: Linguistic Data Consortium.
Search in Google Scholar Back to article
Georgescu, A.-L., Pappalardo, A., Cucu, H., and Blott, M. (2021). Performance vs. hardware requirements in state-of-the-art automatic speech recognition. EURASIP Journal on Audio, Speech, and Music Processing, 2021(28), pp. 1–30.
Search in Google Scholar Back to article
Gerosa, M., Giuliani, D., Narayanan, Sh., and Potamianos, A. (2009). A review of ASR technologies for children’s speech. In Proc. of WOCCI 2009, Cambridge, MA, USA.
Search in Google Scholar Back to article
Huber, J. E., and Stathopoulos, E. T. (1999). Formants of children, women, and men: The effects of vocal intensity variation. Journal of Acoustical Society of America, 106(3 Pt 1), pp. 1532–1542.
Search in Google Scholar Back to article
Lojka, M., Viszlay, P., Staš, J., Hládek, D., and Juhár, J. (2018). Slovak broadcast news speech recognition and transcription system. In: L. Barolli – N. Kryvinska – T. Enokido – M. Takizawa (eds.): Advances in Network-Based Information Systems, LNDECT 22, Springer, Cham, pp. 385–394.
Search in Google Scholar Back to article
Lu, R., Shahin, M. A., and Ahmed, B. (2022). Improving children’s speech recognition by fine-tuning self-supervised adult speech representations. arXiv Preprint. Accessible at: https://arxiv.org/abs/2211.07769.
Search in Google Scholar Back to article
Patel, T., and Scharenborg, O. (2024). Improving end-to-end models for children’s speech recognition. Applied Sciences, 14(6), paper 2353.
Search in Google Scholar Back to article
Pradhan, S. S., Cole, R. A., and Ward, W. H. (2024). My Science Tutor (MyST) – A large corpus of children’s conversational speech. In Proc. of LREC-COLING 2024, Torino, Italia, pp. 12040–12045.
Search in Google Scholar Back to article
Pleva, M., Ondáš, S., Hládek, D., Juhár, J., and Staš, J. (2019). Building of children speech corpus for improving automatic subtitling services. In Proc. of ROCLING 2019, New Taipei City, Taiwan, pp. 325–333.
Search in Google Scholar Back to article
Pratap, V., Tjandra, A., Shi, B., Tomasello, P., Babu, A., Kundu, S., Elkahky, A., Ni, Z., Vyas, A., Fazel-Zarandi, M., Baevskyi, A., Adi, Y., Zhang, X., Hsu, W.-N., Conneau, A., and Auli, M. (2024). Scaling speech technology to 1,000+ languages. Journal of Machine Learning Research, 25, pp. 1–52.
Search in Google Scholar Back to article
Radford A., Kim, J. W., Xu, T., Brockman, G., McLeavy, Ch., and Sutskever, I. (2023). Robust speech recognition via large-scale weak supervision. In Proc. of ICML 2023, Honolulu, Hawai, USA, pp. 28492–28518.
Search in Google Scholar Back to article
Sanchez, A., Meylan, S. C., Braginsky, M., MacDonald, K. E., Yurovsky, D., and Frank, M. C. (2019). childes-db: A flexible and reproducible interface to the child language data exchange system. Behavior Research Methods, 51, pp. 1928–1941.
Search in Google Scholar Back to article
Shivakumar, P. G., and Georgiou, P. (2020). Transfer learning from adult to children for speech recognition: Evaluation, analysis, and recommendations. Computer Speech & Language, 63, paper 101077.
Search in Google Scholar Back to article
Shobaki, K., Hosom, J.-P., and Cole, R. A. (2000). The OGI kids’ speech corpus and recognizers. In Proc. of ICSLP 2000, Beijing, China, pp. 1–4.
Search in Google Scholar Back to article
Sobti, R., Guleria, K., and Kadyan, V. (2024). Comprehensive literature review on children automatic speech recognition system, acoustic linguistic mismatch approaches and challenges. Multimedia Tools and Applications, 83, pp. 81933–81995.
Search in Google Scholar Back to article
Yeung, G., and Alwan, A. (2018). On the difficulties of automatic speech recognition for kindergarten-aged children. In Proc. of INTERSPEECH 2018, Hyderabad, India, pp. 1661–1665.
Search in Google Scholar Back to article

Authors

Metrics

Articles in this issue

DOI: https://doi.org/10.2478/jazcas-2025-0020 | Journal eISSN: 1338-4287 | Journal ISSN: 0021-5597

Journal RSS Feed

Language: English

Page range: 223 - 233

Published on: Nov 27, 2025

Published by: Slovak Academy of Sciences, Ľudovít Štúr Institute of Linguistics

In partnership with: Paradigm Publishing Services

Publication frequency: 3 issues per year

Keywords:

acoustic model,

automatic speech recognition,

data augmentation,

children’s speech,

speech database

Related subjects:

Linguistics and semiotics,

Theoretical frameworks and disciplines,

Linguistics, other

© 2025 Ján Staš, Stanislav Ondáš, Matúš Pleva, Matej Horváth, Richard Ševc, Patrik Michalanský, published by Slovak Academy of Sciences, Ľudovít Štúr Institute of Linguistics
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Volume 76 (2025): Issue 1 (June 2025)