Have a personal or library account? Click to login
Predicting Perceived Semantic Expression of Functional Sounds Using Unsupervised Feature Extraction and Ensemble Learning Cover

Predicting Perceived Semantic Expression of Functional Sounds Using Unsupervised Feature Extraction and Ensemble Learning

Open Access
|Mar 2026

Figures & Tables

tismir-9-1-290-g1.png
Figure 1

Schematic overview of methodological steps.

tismir-9-1-290-g2.png
Figure 2

Mean topic probabilities across the FBMSet‑805 dataset grouped by acoustic property.

tismir-9-1-290-g3.png
Figure 3

Comparison of factor reliabilities across participants in the ground truth data and explanatory power on the full dataset measured by R2 in the regressor predictions. Factor reliabilities are taken from Virkus et al. (2025c) and Virkus et al. (2025b).

Table 1

Overview of model configurations tested during model selection, combining two learning paradigms, three output strategies, and optional metadata inclusion.

Model Selection CriteriaTested Configurations
Learning Paradigm
  • Deep neural network

  • Random forest

Model Output Configuration
  • Multi‑output (all)

  • 3 × Multi‑output (communication level‑wise)

  • 19 × Single‑output (communication dimension‑wise)

Inclusion of Metadata
  • Yes

  • No

Table 2

Overview of prediction performance on the test dataset measured by coefficient of determination R2 for the tested criteria. Mean and standard deviation values of test R2 values are given where results were averaged over multiple models.

Learning ParadigmDeep Neural NetworkDeep Neural NetworkRandom ForestRandom Forest
Industry MetadataNoYesNoYes
Output Configuration
multi‑output_all0.0290.0540.1260.160
multi‑output_level0.042 ± 0.020.053 ± 0.060.104 ± 0.050.139 ± 0.07
single‑output0.010 ± 0.060.023 ± 0.090.122 ± 0.090.135 ± 0.09
Baseline (Linear Regression)0.077
tismir-9-1-290-g4.png
Figure 4

Permutation importances of high‑level topics (model input) for the best‑performing random forest regression model grouped by acoustic property. Feature importance reflects model sensitivity to specific acoustic patterns.

tismir-9-1-290-g5.png
Figure 5

SHAP values for the best‑performing random forest regression model averaged over all samples, aggregated across acoustic property, and grouped by communication level. These values indicate the contribution of each property to predicted semantic expression in the respective communication level.

Table 3

Binomial test results of the validation experiment comparing model predictions against user perception within communication level for 17 sounds from FBMSet‑805.

LevelStatusAppealBrand Identity
n329235329
accuracy38.3%27.66%32.22%
p‑value<0.0010.003<0.001
DOI: https://doi.org/10.5334/tismir.290 | Journal eISSN: 2514-3298
Language: English
Submitted on: Jun 30, 2025
|
Accepted on: Jan 26, 2026
|
Published on: Mar 2, 2026
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2026 Annika Frommholz, Steffen Lepa, Tom Virkus, Stefan Weinzierl, Johannes Helberger, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.