
Figure 1
Schematic overview of methodological steps.

Figure 2
Mean topic probabilities across the FBMSet‑805 dataset grouped by acoustic property.

Figure 3
Comparison of factor reliabilities across participants in the ground truth data and explanatory power on the full dataset measured by in the regressor predictions. Factor reliabilities are taken from Virkus et al. (2025c) and Virkus et al. (2025b).
Table 1
Overview of model configurations tested during model selection, combining two learning paradigms, three output strategies, and optional metadata inclusion.
Table 2
Overview of prediction performance on the test dataset measured by coefficient of determination for the tested criteria. Mean and standard deviation values of test values are given where results were averaged over multiple models.
| Learning Paradigm | Deep Neural Network | Deep Neural Network | Random Forest | Random Forest |
|---|---|---|---|---|
| Industry Metadata | No | Yes | No | Yes |
| Output Configuration | ||||
| multi‑output_all | 0.029 | 0.054 | 0.126 | 0.160 |
| multi‑output_level | 0.042 ± 0.02 | 0.053 ± 0.06 | 0.104 ± 0.05 | 0.139 ± 0.07 |
| single‑output | 0.010 ± 0.06 | 0.023 ± 0.09 | 0.122 ± 0.09 | 0.135 ± 0.09 |
| Baseline (Linear Regression) | 0.077 |

Figure 4
Permutation importances of high‑level topics (model input) for the best‑performing random forest regression model grouped by acoustic property. Feature importance reflects model sensitivity to specific acoustic patterns.

Figure 5
SHAP values for the best‑performing random forest regression model averaged over all samples, aggregated across acoustic property, and grouped by communication level. These values indicate the contribution of each property to predicted semantic expression in the respective communication level.
