Fine-Tuning South Tyrolean Dialect-to-Standard German ASR with AlpiLinK

Greta H. Franzini; Luca Ducceschi

doi:10.5334/johd.533

Fine-Tuning South Tyrolean Dialect-to-Standard German ASR with AlpiLinK

Journal of Open Humanities Data

Volume 12 (2026): Issue 1

By: Greta H. Franzini and Luca Ducceschi

Open Access

|Jun 2026

Adda-Decker, M., Lamel, L., Adda, G., & Lavergne, T. (2014). A First LVCSR System for Luxembourgish, a Low-Resourced European Language. In Z. Vetulani & J. Mariani (Eds.), Human Language Technology Challenges for Computer Science and Linguistics (pp. 479–490). Springer International Publishing. 10.1007/978-3-319-08958-4_39
Open DOI Search in Google Scholar Back to article
Blaschke, V., Winkler, M., Förster, C., Wenger-Glemser, G., & Plank, B. (2025). A Multi-Dialectal Dataset for German Dialect ASR and Dialect-to-Standard Speech Translation. In Proceedings of Interspeech 2025 (pp. 913–917). ISCA. 10.21437/Interspeech.2025-318
Open DOI Search in Google Scholar Back to article
Colletti, N., & Lombardo, S. (2025). Südtiroler Sprachbarometer: Sprachgebrauch und Sprachidentität in Südtirol. (Tech. Rep.), Autonome Provinz Bozen-Südtirol, Landesinstitut für Statistik – ASTAT. Retrieved 2026-05-06, from https://assets-eu-01.kc-usercontent.com/b5376750-8076-01cf-17d2-d343e29778a7/936599a9-e59d-4c7b-bb13-f84efd0718c8/Sprachbarometer
Search in Google Scholar Back to article
Dehak, N., Dumouchel, P., & Kenny, P. (2007). Modeling Prosodic Features With Joint Factor Analysis for Speaker Verification. In IEEE Transactions on Audio, Speech and Language Processing (pp. 2095–2103). 10.1109/TASL.2007.902758
Open DOI Search in Google Scholar Back to article
Dogan-Schönberger, P., Mäder, J., & Hofmann, T. (2021). SwissDial: Parallel Multidialectal Corpus of Spoken Swiss German. (Working Paper 2103.11401). Cornell University.
Search in Google Scholar Back to article
Ducceschi, L., & Franzini, G. (2025). Speech transcription from South Tyrolean Dialect to Standard German with Whisper. In Proceedings of Interspeech 2025 (pp. 1–5). ISCA. 10.21437/Interspeech.2025-1976
Open DOI Search in Google Scholar Back to article
Fant, G. (1960). Acoustic Theory of Speech Production. Mouton.
Search in Google Scholar Back to article
FFmpeg Developers (2016). ffmpeg tool. https://ffmpeg.org/
Search in Google Scholar Back to article
Franzini, G., & Ducceschi, L. (to appear). South Tyrolean Dialect-to-Standard Speech Translation: A Resource. In Proceedings of the Workshop on Dialects in NLP — A Resource Perspective (DialRes-LREC26), co-located with the Language Resources and Evaluation Conference (LREC). Palma de Mallorca, Spain.
Search in Google Scholar Back to article
Gilles, P., Hillah, L., & Hosseini-Kivanani, N. (2023). ASRLUX: Automatic Speech Recognition for the low-resource language Luxembourgish. In R. Skarnitzl & J. Volín (Eds.), Proceedings of the 20th International Congress of Phonetic Sciences (pp. 3091–3095). https://hdl.handle.net/10993/55819
Search in Google Scholar Back to article
Hollenstein, N., & Aepli, N. (2014). Compilation of a Swiss German Dialect Corpus and its Application to PoS Tagging. In M. Zampieri, L. Tan, N. Ljubešić, & J. Tiedemann (Eds.), Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects (pp. 85–94). Association for Computational Linguistics and Dublin City University. 10.3115/v1/W14-5310
Open DOI Search in Google Scholar Back to article
Hosseini-Kivanani, N., Schommer, C., & Gilles, P. (2025). Voices of Luxembourg: Tackling Dialect Diversity in a Low-Resource Setting. In Š. A. Holdt, N. Ilinykh, B. Scalvini, M. Bruton, I. N. Debess, & C. M. Tudor (Eds.), Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025) (pp. 143–152). University of Tartu Library, Estonia. https://aclanthology.org/2025.resourceful-1.29/
Search in Google Scholar Back to article
Kent, R., & Read, C. (2002). The Acoustic Analysis of Speech. Singular Publishing Group.
Search in Google Scholar Back to article
Kerle, L. K., Pucher, M., & Schuppler, B. (2023). Speaker interpolation based data augmentation for automatic speech recognition. In Proceedings of the 20th International Congress of Phonetic Sciences (pp. 3126–3130). International Phonetic Association. https://phaidra.kug.ac.at/o:127503
Search in Google Scholar Back to article
Kruijt, A., & Rabanus, S. (2025). From VinKo to AlpiLinK: web-based long-term storage and accessibility of information. Korpus im Text, 17. https://www.kit.gwi.uni-muenchen.de/?p=106395&v=1
Search in Google Scholar Back to article
Linke, J., Winkler, J., & Schuppler, B. (2025). Context is all you need? Low-resource conversational ASR profits from context, coming from the same or from the other speaker. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp. 3199–3203). ISCA, International Speech Communication Association. 10.21437/Interspeech.2025-1824
Open DOI Search in Google Scholar Back to article
Plüss, M., Deriu, J., Schraner, Y., Paonessa, C., Hartmann, J., Schmidt, L., …, Cieliebak, M. (2023). STT4SG-350: A Speech Corpus for All Swiss German Dialect Regions. In A. Rogers, J. Boyd-Graber, & N. Okazaki (Eds.), Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (pp. 1763–1772). Association for Computational Linguistics. 10.18653/v1/2023.acl-short.150
Open DOI Search in Google Scholar Back to article
Plüss, M., Hürlimann, M., Cuny, M., Stöckli, A., Kapotis, N., Hartmann, J., …, Vogel, M. (2022). SDS-200: A Swiss German Speech to Standard German Text Corpus. In N. Calzolari, et al. (Eds.), Proceedings of the Thirteenth Language Resources and Evaluation Conference (pp. 3250–3256). European Language Resources Association. Retrieved 2026-05-06, from https://aclanthology.org/2022.lrec-1.347/
Search in Google Scholar Back to article
Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., & Sutskever, I. (2023). Robust speech recognition via large-scale weak supervision. In International Conference on Machine Learning (pp. 28492–28518). PMLR. https://arxiv.org/abs/2212.04356
Search in Google Scholar Back to article
Samardžić, T., Scherrer, Y., & Glaser, E. (2015). Normalising orthographic and dialectal variants for the automatic processing of Swiss German. In Proceedings of the 4th Biennial Workshop on Less-resourced Languages (pp. 294–298). ELRA. Retrieved 2026-05-06, from https://ltc.amu.edu.pl/a2015/book/papers/LRL-2.pdf
Search in Google Scholar Back to article
Sicard, C., Gillioz, V., & Pyszkowski, K. (2023). Spaiche: Extending State-of-the-Art ASR Models to Swiss German Dialects. In H. Ghorbel, M. Sokhn, M. Cieliebak, M. Hürlimann, E. de Salis, & J. Guerne (Eds.), Proceedings of the 8th Edition of the Swiss Text Analytics Conference (pp. 76–83). Association for Computational Linguistics. Retrieved 2026-05-06, from https://aclanthology.org/2023.swisstext-1.8/
Search in Google Scholar Back to article
Solberg, E., Ortiz, P., Parsons, P., Svendsen, T., & Salvi, G. (2023). Improving Generalization of Norwegian ASR with Limited Linguistic Resources. In Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa) (pp. 508–517). University of Tartu Library. Retrieved 2026-05-06, from https://aclanthology.org/2023.nodalida-1.51/
Search in Google Scholar Back to article
Stöckle, P., & Vergeiner, P. (2025). Geographical patterns in the Bavarian dialects of Austria and South Tyrol. A real-time comparison using dialectometric methods. Zeitschrift für Sprachvariation und Soziolinguistik - Journal of Language Variation and Sociolinguistics (pp. 20–41). 10.5282/jlvs/7
Open DOI Search in Google Scholar Back to article
Titze, I. (1994). Principles of Voice Production. Prentice Hall.
Search in Google Scholar Back to article
Volgger, J., Röggla, M., Ganthaler, D., Iacopino, T., Mühlberger, M., & Mariz, C. (2024). Autonomy Dashboard South Tyrol. Eurac Research. Retrieved from https://www.eurac.edu/doi/10-57749-m70n-ms30
Search in Google Scholar Back to article
Vásquez-Correa, J. C., Orozco-Arroyave, J. R., Bocklet, T., & Nöth, E. (2018). Towards an automatic evaluation of the dysarthria level of patients with Parkinson’s disease. Journal of Communication Disorders, 21–26. 10.1016/j.jcomdis.2018.08.002
Open DOI Search in Google Scholar Back to article
Wiesinger, P. (1983). Die Einteilung der deutschen Dialekte. In W. Besch, U. Knoop, W. Putschke, & H. Wiegand (Eds.), Dialektologie (pp. 807–900). de Gruyter. 10.1515/9783110203332.807
Open DOI Search in Google Scholar Back to article
Zhang, Y., Han, W., Qin, J., Wang, Y., Bapna, A., Chen, Z., …, others. (2023). Google USM: Scaling automatic speech recognition beyond 100 languages. 10.48550/arXiv.2303.01037
Open DOI Search in Google Scholar Back to article