Skip to main content
Have a personal or library account? Click to login
Fine-Tuning South Tyrolean Dialect-to-Standard German ASR with AlpiLinK Cover

Fine-Tuning South Tyrolean Dialect-to-Standard German ASR with AlpiLinK

Open Access
|Jun 2026

Abstract

This paper presents ongoing research on the reuse of the AlpiLinK corpus within a broader effort to adapt Automatic Speech Recognition (ASR) technology to the linguistic challenges posed by South Tyrolean dialect, a cluster of Upper German varieties spoken in the multilingual province of South Tyrol. While Standard German dominates written communication, everyday speech frequently occurs in dialect, whose phonological, morphological and lexical divergence from the standard limits the performance of mainstream ASR systems. Within this context, we develop fine-tuned models built upon openly available ASR architectures and trained on domain-specific data, with Standard German as the target written output. Central to this work is the repurposing of the only publicly available dataset for this language pair, the AlpiLinK Corpus, which was originally created for dialectological research rather than for machine-learning applications. We combine AlpiLinK with additional sources, including online media, partner-contributed recordings and in-house material, within an ongoing data collection and refinement process. We describe the characteristics, strengths and limitations of AlpiLinK from the perspective of its reuse for ASR fine-tuning, alongside the methodological steps required for data preparation and model training in this setting. Preliminary results indicate that the current fine-tuned model substantially accelerates transcription workflows, while still exhibiting weaknesses in dialect-specific forms, punctuation, named entities and multilingual interference. We conclude by outlining recommendations for improving datasets intended for low-resource dialectal ASR, with particular attention to audio quality, licensing considerations, metadata stability and explicit consent for machine-learning-related reuse.

DOI: https://doi.org/10.5334/johd.533 | Journal eISSN: 2059-481X
Language: English
Page range: 74 - 74
Submitted on: Mar 1, 2026
Accepted on: May 5, 2026
Published on: Jun 8, 2026
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2026 Greta H. Franzini, Luca Ducceschi, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.