Predicting Perceived Semantic Expression of Functional Sounds Using Unsupervised Feature Extraction and Ensemble Learning

Annika Frommholz; Steffen Lepa; Tom Virkus; Stefan Weinzierl; Johannes Helberger

doi:10.5334/tismir.290

Abstract

Functional sounds—typically brief, nonverbal audio cues used in the interfaces of electronic devices—play a critical role in human–machine interaction but remain largely unexplored within music information retrieval (MIR). This study proposes a data-driven framework that uses musically informed audio features to predict the perceived semantic expression of functional sounds. Our three-stage pipeline first uses unsupervised feature extraction to transform 805 functional sounds into high-level topic distributions for timbre, chroma, and loudness using Gaussian mixture models and latent Dirichlet allocation. Second, these features train multi-output regression models to predict 19 perceptual dimensions from the FBMUX framework, with a random forest regressor achieving the best performance. Finally, a listening experiment assesses how well the model predictions align with user perceptions. Interpretability analyses further reveal how individual features contribute to model predictions. This work contributes to MIR by expanding its scope to the domain of functional, non-musical audio. It presents a novel application of MIR techniques, demonstrating that structured, musically informed descriptors can support perceptual modeling in domains with limited data and high subjective variance. It contributes a transferable approach and highlights the potential of MIR to inform human–machine interaction and sound design.

References

Anzenbacher, C., Czedik‑Eysenberg, I., Reuter, C., and Oehler, M. (2017). Der klang der marken ‑ untersuchungen zu branchentypischen eigenschaften von audiologos. In W. Auhagen, C. Bullerjahn, and C. Louven (Eds.), Musikpsychologie. Jahrbuch der Deutschen Gesellschaft für Musikpsychologie. Band 27: Akustik und musikalische Hörwahrnehmung. Hogrefe.
Search in Google Scholar Back to article
Arnal, L. H., Flinker, A., Kleinschmidt, A., Giraud, A.‑L., and Poeppel, D. (2015). Human screams occupy a privileged niche in the communication soundscape. Current Biology, 25(15), 2051–2056.
Search in Google Scholar Back to article
Aucouturier, J., and Pachet, F. (2004). Tools and architecture for the evaluation of similarity measures: Case study of timbre similarity. In Proceedings of the 5th International Conference on Music Information Retrieval (ISMIR). Barcelona, Spain: ISMIR.
Search in Google Scholar Back to article
Aures, W. (1985). Der sensorische wohlklang als funktion psychoakustischer empfindungsgrößen. Acustica, 58, 282–290.
Search in Google Scholar Back to article
Bailes, F., Stevens, C., Dean, R., and Olsen, K. (2015). Both acoustic intensity and loudness contribute to time‑series models of perceived affect in response to music. Psychomusicology: Music, Mind & Brain, 25(2), 124–137.
Search in Google Scholar Back to article
Bian, W. (2018). Convolutional neural networks for music mood classification tasks. Submitted to the MIREX Challenge 2018. https://www.music-ir.org/mirex/abstracts/ 2018/WB1.pdf.
Search in Google Scholar Back to article
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
Search in Google Scholar Back to article
Brewster, S. A., Wright, P. C., and Edwards, A. D. (1994). A detailed investigation into the effectiveness of earcons. In Santa Fe Institute Studies in the Sciences of Complexity‑Proceedings (Vol. 18, pp. 471–471). Addison‑ Wesley Publishing Co.
Search in Google Scholar Back to article
Cabrera, D., Ferguson, S., Rizwi, F., and Schubert, E. (2008). Psysound3: A program for the analysis of sound recordings. The Journal of the Acoustical Society of America, 123(5_Supplement), 3247–3247.
Search in Google Scholar Back to article
Cai, L., Ferguson, S., Lu, H., and Fang, G. (2022). Feature selection approaches for optimising music emotion recognition methods. In Artificial Intelligence, Soft Computing and Applications (pp. 09–27). Academy and Industry Research Collaboration Center (AIRCC).
Search in Google Scholar Back to article
Cao, C., and Li, M. (2009). Thinkit’s submissions for mirex2009 audio music classification and similarity tasks. Submitted to the MIREX Challenge 2009. https://www.music-ir.org/mirex/abstracts/2009/CL.pdf.
Search in Google Scholar Back to article
Di Stefano, N., Vuust, P., and Brattico, E. (2022). Consonance and dissonance perception: A critical review of the historical sources, multidisciplinary findings, and main hypotheses. Physics of Life Reviews, 43, 273–304.
Search in Google Scholar Back to article
Fastl, H., Kerber, S., and Guzsvany, N. (2007). Untersuchungen zur aufschreckenden wirkung (startling) synthetischer geräusche. In Proceedings of the DAGA Conference 2007 (pp. 559–560). Stuttgart: DAGA.
Search in Google Scholar Back to article
Flexer, A. (2014). On inter‑rater agreement in audio music similarity. In H. Wang, Y. Yang, and J. H. Lee (Eds.), Proceedings of the 15th International Society for Music Information Retrieval Conference (ISMIR) (pp. 245–250). Taipei, Taiwan: ISMIR.
Search in Google Scholar Back to article
Flexer, A., Lallai, T., and Rašl, K. (2021). On evaluation of inter‑ and intra‑rater agreement in music recommendation. Transactions of the International Society for Music Information Retrieval, 4(1), 182.
Search in Google Scholar Back to article
Friberg, A., Schoonderwaldt, E., Hedblad, A., Fabiani, M., and Elowsson, A. (2014). Using listener‑based perceptual features as intermediate representations in music information retrieval. The Journal of the Acoustical Society of America, 136(4), 1951–1963.
Search in Google Scholar Back to article
Frommholz, A. (2026). SoundInnovationLab/somunicate‑model‑selection: Stable release with data. 10.5281/zenodo.18404881.
Open DOI Search in Google Scholar Back to article
Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning. MIT Press. https://www.deeplearningbook.org.
Search in Google Scholar Back to article
Graakjær, N. J., and Bonde, A. (2018). Non‑musical sound branding – a conceptualization and research overview. European Journal of Marketing, 52(7/8), 1505–1525.
Search in Google Scholar Back to article
Green Forge Coop. (2024). Mosqito. Version 1.2.1. 10.5281/zenodo.11026796.
Open DOI Search in Google Scholar Back to article
Harris, C. R., Millman, K. J., van der Walt, S. J., Gommers, R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N. J., Kern, R., Picus, M., Hoyer, S., van Kerkwijk, M. H., Brett, M., Haldane, A., del Río, J. F., Wiebe, M. … Oliphant, T. E. (2020). Array programming with NumPy. Nature, 585, 357–362.
Search in Google Scholar Back to article
Hermann, T., Hunt, A., and Neuhoff, J. G (Eds.). (2011). The Sonification Handbook. Logos Publishing House.
Search in Google Scholar Back to article
Herzog, M., Lepa, S., and Egermann, H. (2016). Towards automatic music recommendation for audio branding scenarios. In Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR). New York City, USA: ISMIR.
Search in Google Scholar Back to article
Hu, D. J., and Saul, L. K. (2009). A probabilistic topic model for music analysis. In Proceedings of NIPS (Vol. 9). CiteSeer.
Search in Google Scholar Back to article
Jurafsky, D., and Martin, J. H. (2025). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition With Language Models. Online manuscript released January 12, 2025.
Search in Google Scholar Back to article
Kaiser, H. F. (1960). The application of electronic computers to factor analysis. Educational and Psychological Measurement, 20(1), 141–151.
Search in Google Scholar Back to article
Kang, J., and Herremans, D. (2024). Are we there yet? A brief survey of music emotion prediction datasets, models and outstanding challenges. arXiv preprint arXiv:2406.08809.
Search in Google Scholar Back to article
Kim, S., Georgiou, P., and Narayanan, S. (2012). Latent acoustic topic models for unstructured audio classification. APSIPA Transactions on Signal and Information Processing, 1(1), e6.
Search in Google Scholar Back to article
Knoeferle, K. (2012). Using customer insights to improve product sound design. Marketing Review St. Gallen, 29, 47–53.
Search in Google Scholar Back to article
Lartillot, O., Toiviainen, P., and Eerola, T. (2008). A Matlab toolbox for music information retrieval. In C. Preisach, H. Burkhardt, L. Schmidt‑Thieme, and R. Decker (Eds.), Data Analysis, Machine Learning and Applications (pp. 261–268). Springer Berlin Heidelberg.
Search in Google Scholar Back to article
Lepa, S., Herzog, M., Steffens, J., Schönrock, A., and Egermann, H. (2020a). A computational model for predicting perceived musical expression in branding scenarios. Journal of New Music Research, 49(4), 387–402. 10.1080/09298215.2020.1778041.
Open DOI Search in Google Scholar Back to article
Lepa, S., Steffens, J., Herzog, M., and Egermann, H. (2020b). Popular music as entertainment communication: How perceived semantic expression explains liking of previously unknown music. Media and Communication, 8(3), 191–204.
Search in Google Scholar Back to article
Lundberg, S. M., and Lee, S.‑I. (2017). A unified approach to interpreting model predictions. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Advances in Neural Information Processing Systems (Vol. 30). Curran Associates, Inc.
Search in Google Scholar Back to article
Lundberg, S. M., Erion, G., Chen, H., DeGrave, A., Prutkin, J. M., Nair, B., Katz, R., Himmelfarb, J., Bansal, N., and Lee, S.‑I. (2020). From local explanations to global understanding with explainable ai for trees. Nature Machine Intelligence, 2(1), 2522–5839.
Search in Google Scholar Back to article
Mahalanobis, P. C. (1936). On the generalised distance in statistics. Sankhya A, 80(Suppl 1), 1–7.
Search in Google Scholar Back to article
Mas, L., Bolls, P., Rodero, E., Barreda‑Ángeles, M., and Churchill, A. (2021). The impact of the sonic logo’s acoustic features on orienting responses, emotions and brand personality transmission. Journal of Product & Brand Management, 30(5), 740–753.
Search in Google Scholar Back to article
Mauch, M., MacCallum, R. M., Levy, M., & Leroi, A. M. (2015). The evolution of popular music: USA 1960–2010. Royal Society Open Science, 2(5), arXiv preprint arXiv:1502.05417.
Search in Google Scholar Back to article
McDonald, R. P. (1999). Test Theory: A Unified Treatment (1st ed.). Psychology Press.
Search in Google Scholar Back to article
McFee, B., Raffel, C., Liang, D., Ellis, D. P., McVicar, M., Battenberg, E., & Nieto, O. (2015). librosa: Audio and music signal analysis in Python. In Proceedings of the 14th Python in Science Conference (Vol. 8).
Search in Google Scholar Back to article
McKinney, W. (2010). Data structures for statistical computing in Python. In Proceedings of the 9th Python in Science Conference (Vol. 445, pp. 51–56). Austin, TX: Python in Science Conference.
Search in Google Scholar Back to article
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L. … Chintala, S. (2011). Psychoacoustic experiments on feasible sound levels of possible warning signals for quiet vehicles. In Proceedings of the DAGA Conference 2011 (pp. 583–584). Düsseldorf; DAGA.
Search in Google Scholar Back to article
Özcan, E., and Vanegmond, R. (2012). Basic semantics of product sounds. International Journal of Design, 6, 41–54.
Search in Google Scholar Back to article
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L. … Chintala, S. (2019). PyTorch: An imperative style, high‑performance deep learning library. In Advances in Neural Information Processing Systems 32 (pp. 8024–8035). Curran Associates, Inc.
Search in Google Scholar Back to article
Pearce, A., Brookes, T., and Mason, R. (2019). Modelling timbral hardness. Applied Sciences, 9(3), 466.
Search in Google Scholar Back to article
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit‑learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
Search in Google Scholar Back to article
Peeters, G. (2008). A generic training and classification system for mirex08 classification tasks: Audio music mood, audio genre, audio artist and audio tag. Submitted to the MIREX Challenge 2008. https://www.music-ir.org/mirex/abstracts/2008/Peeters_2008_ISMIR_MIREX.pdf.
Search in Google Scholar Back to article
Řehůřek, R., and Sojka, P. (2010). Software framework for topic modelling with large corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks (pp. 45–50). Valletta, Malta: ELRA.
Search in Google Scholar Back to article
Rocchesso, D., Delle Monache, S., and Barrass, S. (2019). 50 years of the International Journal of Human‑Computer Studies. Reflections on the past, present and future of human‑centred technologies. International Journal of Human‑Computer Studies, 131, 152–159.
Search in Google Scholar Back to article
Russell, J. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39, 1161–1178.
Search in Google Scholar Back to article
Schedl, M., Flexer, A., and Urbano, J. (2013). The neglected user in music information retrieval research. Journal of Intelligent Information Systems, 41(3), 523–539.
Search in Google Scholar Back to article
Schulz von Thun, F. (1981). Miteinander Reden 1: Störungen und Klärungen: Allgemeine Psychologie der Kommunikation (48th ed.). Rowohlt Taschenbuch.
Search in Google Scholar Back to article
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464.
Search in Google Scholar Back to article
Serafin, S., Buxton, B., Gaver, B., and Bly, S. (2022). Auditory Interfaces (1st ed.). Focal Press.
Search in Google Scholar Back to article
Song, G., Ding, S., and Wang, Z. (2018). Audio classification tasks using recurrent neural network. Submitted to the MIREX Challenge 2018. https://www.music-ir.org/ mirex/abstracts/2018/GS1.pdf.
Search in Google Scholar Back to article
Tardieu, D., Charbuillet, C., Cornu, F., and Peeters, G. (2011). Mirex‑2011 single‑label and multi‑label classification tasks: Ircamclassification2011 submission. Submitted to the MIREX Challenge 2011. https://www.music-ir.org/mirex/ abstracts/2011/TCCP4.pdf.
Search in Google Scholar Back to article
Techawachirakul, M., Pathak, A., Motoki, K., and Calvert, G. A. (2023). Influencing brand personality with sonic logos: The role of musical timbre. Journal of Business Research, 168, 114169.
Search in Google Scholar Back to article
Tzanetakis, G., and Cook, P. (2000). Marsyas: A framework for audio analysis. Organised Sound, 4(3), 169–175.
Search in Google Scholar Back to article
Tzanetakis, G., and Cook, P. (2002). Musical genre classification of audio signals. IEEE Transactions on Speech and Audio Processing, 10(5), 293–302.
Search in Google Scholar Back to article
Virkus, T., Lepa, S., and Frommholz, A. (2025a). Validation of the FBMUX questionnaire for measuring communicative expression of UX sounds on 3 levels. Preprint.
Search in Google Scholar Back to article
Virkus, T., Lepa, S., and Helberger, J. (2025c). The semantic expression space of UX sounds on the functional level of product communication: An exploratory study with sound designers and consumers. Preprint.
Search in Google Scholar Back to article
Virkus, T., Lepa, S., Frommholz, A., and Helberger, J. (2025b). The semantic expression space of UX sounds on the brand identity level of product communication. In Mensch und Computer 2025 ‑ Workshopband. Gesellschaft für Informatik e.V.
Search in Google Scholar Back to article
von Bismarck, G. (1974). Timbre of steady sounds: A factorial investigation of its verbal attributes. Acustica, 30, 146–159.
Search in Google Scholar Back to article
Wang, J.‑C., Yang, Y.‑H., Wang, H.‑M., and Jeng, S.‑K. (2012). The acoustic emotion Gaussians model for emotion‑based music annotation and retrieval. In Proceedings of the 20th ACM International Conference on Multimedia (pp. 89–98). New York, NY, USA: Association for Computing Machinery.
Search in Google Scholar Back to article
Yang, Y.‑H., Lin, Y.‑C., Su, Y.‑F., and Chen, H. H. (2008). A regression approach to music emotion recognition. IEEE Transactions on Audio, Speech, and Language Processing, 16(2), 448–457.
Search in Google Scholar Back to article

Predicting Perceived Semantic Expression of Functional Sounds Using Unsupervised Feature Extraction and Ensemble Learning

Abstract

Paradigm

My account