Table 1
Current overview of ASR training data used in this study (v1.3).
| SOURCE | TYPE | HOURS | SPEAKERS | AGE GROUP | PROVENANCE |
|---|---|---|---|---|---|
| Learner textbook | scripted | 47 m | 3 | 20–49 | public |
| AlpiLinK | scripted | 4 h 47 m | 180 | 10–89 | public |
| In-house recordings | scripted, spontaneous | 3 h 42 m | 3 | 20–49 | in-house, contributed |
| Audiovisual archives | scripted, spontaneous | 1 h 53 m | 9 | 0–99 | contributed |
| Promotional videos | spontaneous | 1 h 03 m | 86 | 20–79 | public |
| TOTAL | 13 h 16 m |
Table 2
Training data evolution and ASR performance computed on the same held-out test set derived from version v1.3 of the training data. Model v1.2 yields the best overall performance.
| MODEL | TRAINING DATA | PERFORMANCE (V1.3) | ||
|---|---|---|---|---|
| TOKENS | DURATION | WER ↓ | BLEU ↑ | |
| baseline | n/a (no fine-tuning) | 0.46 | 44.58 | |
| v1.0 | 51,474 | 6 h 39 m | 0.37 | 0.52 |
| v1.1 | 75,203 | 9 h 07 m | 0.27 | 65.65 |
| v1.2 | 87,298 | 10 h 16 m | 0.24 | 69.13 |
| v1.3 | 88,810 | 10 h 26 m | 0.24 | 68.73 |
