Have a personal or library account? Click to login
Predicting Perceived Semantic Expression of Functional Sounds Using Unsupervised Feature Extraction and Ensemble Learning Cover

Predicting Perceived Semantic Expression of Functional Sounds Using Unsupervised Feature Extraction and Ensemble Learning

Open Access
|Mar 2026

References

  1. Anzenbacher, C., Czedik‑Eysenberg, I., Reuter, C., and Oehler, M. (2017). Der klang der marken ‑ untersuchungen zu branchentypischen eigenschaften von audiologos. In W. Auhagen, C. Bullerjahn, and C. Louven (Eds.), Musikpsychologie. Jahrbuch der Deutschen Gesellschaft für Musikpsychologie. Band 27: Akustik und musikalische Hörwahrnehmung. Hogrefe.
  2. Arnal, L. H., Flinker, A., Kleinschmidt, A., Giraud, A.‑L., and Poeppel, D. (2015). Human screams occupy a privileged niche in the communication soundscape. Current Biology, 25(15), 20512056.
  3. Aucouturier, J., and Pachet, F. (2004). Tools and architecture for the evaluation of similarity measures: Case study of timbre similarity. In Proceedings of the 5th International Conference on Music Information Retrieval (ISMIR). Barcelona, Spain: ISMIR.
  4. Aures, W. (1985). Der sensorische wohlklang als funktion psychoakustischer empfindungsgrößen. Acustica, 58, 282290.
  5. Bailes, F., Stevens, C., Dean, R., and Olsen, K. (2015). Both acoustic intensity and loudness contribute to time‑series models of perceived affect in response to music. Psychomusicology: Music, Mind & Brain, 25(2), 124137.
  6. Bian, W. (2018). Convolutional neural networks for music mood classification tasks. Submitted to the MIREX Challenge 2018. https://www.music-ir.org/mirex/abstracts/ 2018/WB1.pdf.
  7. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 532.
  8. Brewster, S. A., Wright, P. C., and Edwards, A. D. (1994). A detailed investigation into the effectiveness of earcons. In Santa Fe Institute Studies in the Sciences of Complexity‑Proceedings (Vol. 18, pp. 471471). Addison‑ Wesley Publishing Co.
  9. Cabrera, D., Ferguson, S., Rizwi, F., and Schubert, E. (2008). Psysound3: A program for the analysis of sound recordings. The Journal of the Acoustical Society of America, 123(5_Supplement), 32473247.
  10. Cai, L., Ferguson, S., Lu, H., and Fang, G. (2022). Feature selection approaches for optimising music emotion recognition methods. In Artificial Intelligence, Soft Computing and Applications (pp. 0927). Academy and Industry Research Collaboration Center (AIRCC).
  11. Cao, C., and Li, M. (2009). Thinkit’s submissions for mirex2009 audio music classification and similarity tasks. Submitted to the MIREX Challenge 2009. https://www.music-ir.org/mirex/abstracts/2009/CL.pdf.
  12. Di Stefano, N., Vuust, P., and Brattico, E. (2022). Consonance and dissonance perception: A critical review of the historical sources, multidisciplinary findings, and main hypotheses. Physics of Life Reviews, 43, 273304.
  13. Fastl, H., Kerber, S., and Guzsvany, N. (2007). Untersuchungen zur aufschreckenden wirkung (startling) synthetischer geräusche. In Proceedings of the DAGA Conference 2007 (pp. 559560). Stuttgart: DAGA.
  14. Flexer, A. (2014). On inter‑rater agreement in audio music similarity. In H. Wang, Y. Yang, and J. H. Lee (Eds.), Proceedings of the 15th International Society for Music Information Retrieval Conference (ISMIR) (pp. 245250). Taipei, Taiwan: ISMIR.
  15. Flexer, A., Lallai, T., and Rašl, K. (2021). On evaluation of inter‑ and intra‑rater agreement in music recommendation. Transactions of the International Society for Music Information Retrieval, 4(1), 182.
  16. Friberg, A., Schoonderwaldt, E., Hedblad, A., Fabiani, M., and Elowsson, A. (2014). Using listener‑based perceptual features as intermediate representations in music information retrieval. The Journal of the Acoustical Society of America, 136(4), 19511963.
  17. Frommholz, A. (2026). SoundInnovationLab/somunicate‑model‑selection: Stable release with data. 10.5281/zenodo.18404881.
  18. Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning. MIT Press. https://www.deeplearningbook.org.
  19. Graakjær, N. J., and Bonde, A. (2018). Non‑musical sound branding – a conceptualization and research overview. European Journal of Marketing, 52(7/8), 15051525.
  20. Green Forge Coop. (2024). Mosqito. Version 1.2.1. 10.5281/zenodo.11026796.
  21. Harris, C. R., Millman, K. J., van der Walt, S. J., Gommers, R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N. J., Kern, R., Picus, M., Hoyer, S., van Kerkwijk, M. H., Brett, M., Haldane, A., del Río, J. F., Wiebe, M.Oliphant, T. E. (2020). Array programming with NumPy. Nature, 585, 357362.
  22. Hermann, T., Hunt, A., and Neuhoff, J. G (Eds.). (2011). The Sonification Handbook. Logos Publishing House.
  23. Herzog, M., Lepa, S., and Egermann, H. (2016). Towards automatic music recommendation for audio branding scenarios. In Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR). New York City, USA: ISMIR.
  24. Hu, D. J., and Saul, L. K. (2009). A probabilistic topic model for music analysis. In Proceedings of NIPS (Vol. 9). CiteSeer.
  25. Jurafsky, D., and Martin, J. H. (2025). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition With Language Models. Online manuscript released January 12, 2025.
  26. Kaiser, H. F. (1960). The application of electronic computers to factor analysis. Educational and Psychological Measurement, 20(1), 141151.
  27. Kang, J., and Herremans, D. (2024). Are we there yet? A brief survey of music emotion prediction datasets, models and outstanding challenges. arXiv preprint arXiv:2406.08809.
  28. Kim, S., Georgiou, P., and Narayanan, S. (2012). Latent acoustic topic models for unstructured audio classification. APSIPA Transactions on Signal and Information Processing, 1(1), e6.
  29. Knoeferle, K. (2012). Using customer insights to improve product sound design. Marketing Review St. Gallen, 29, 4753.
  30. Lartillot, O., Toiviainen, P., and Eerola, T. (2008). A Matlab toolbox for music information retrieval. In C. Preisach, H. Burkhardt, L. Schmidt‑Thieme, and R. Decker (Eds.), Data Analysis, Machine Learning and Applications (pp. 261268). Springer Berlin Heidelberg.
  31. Lepa, S., Herzog, M., Steffens, J., Schönrock, A., and Egermann, H. (2020a). A computational model for predicting perceived musical expression in branding scenarios. Journal of New Music Research, 49(4), 387402. 10.1080/09298215.2020.1778041.
  32. Lepa, S., Steffens, J., Herzog, M., and Egermann, H. (2020b). Popular music as entertainment communication: How perceived semantic expression explains liking of previously unknown music. Media and Communication, 8(3), 191204.
  33. Lundberg, S. M., and Lee, S.‑I. (2017). A unified approach to interpreting model predictions. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Advances in Neural Information Processing Systems (Vol. 30). Curran Associates, Inc.
  34. Lundberg, S. M., Erion, G., Chen, H., DeGrave, A., Prutkin, J. M., Nair, B., Katz, R., Himmelfarb, J., Bansal, N., and Lee, S.‑I. (2020). From local explanations to global understanding with explainable ai for trees. Nature Machine Intelligence, 2(1), 25225839.
  35. Mahalanobis, P. C. (1936). On the generalised distance in statistics. Sankhya A, 80(Suppl 1), 17.
  36. Mas, L., Bolls, P., Rodero, E., Barreda‑Ángeles, M., and Churchill, A. (2021). The impact of the sonic logo’s acoustic features on orienting responses, emotions and brand personality transmission. Journal of Product & Brand Management, 30(5), 740753.
  37. Mauch, M., MacCallum, R. M., Levy, M., & Leroi, A. M. (2015). The evolution of popular music: USA 1960–2010. Royal Society Open Science, 2(5), arXiv preprint arXiv:1502.05417.
  38. McDonald, R. P. (1999). Test Theory: A Unified Treatment (1st ed.). Psychology Press.
  39. McFee, B., Raffel, C., Liang, D., Ellis, D. P., McVicar, M., Battenberg, E., & Nieto, O. (2015). librosa: Audio and music signal analysis in Python. In Proceedings of the 14th Python in Science Conference (Vol. 8).
  40. McKinney, W. (2010). Data structures for statistical computing in Python. In Proceedings of the 9th Python in Science Conference (Vol. 445, pp. 5156). Austin, TX: Python in Science Conference.
  41. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L.Chintala, S. (2011). Psychoacoustic experiments on feasible sound levels of possible warning signals for quiet vehicles. In Proceedings of the DAGA Conference 2011 (pp. 583584). Düsseldorf; DAGA.
  42. Özcan, E., and Vanegmond, R. (2012). Basic semantics of product sounds. International Journal of Design, 6, 4154.
  43. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L.Chintala, S. (2019). PyTorch: An imperative style, high‑performance deep learning library. In Advances in Neural Information Processing Systems 32 (pp. 80248035). Curran Associates, Inc.
  44. Pearce, A., Brookes, T., and Mason, R. (2019). Modelling timbral hardness. Applied Sciences, 9(3), 466.
  45. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit‑learn: Machine learning in Python. Journal of Machine Learning Research, 12, 28252830.
  46. Peeters, G. (2008). A generic training and classification system for mirex08 classification tasks: Audio music mood, audio genre, audio artist and audio tag. Submitted to the MIREX Challenge 2008. https://www.music-ir.org/mirex/abstracts/2008/Peeters_2008_ISMIR_MIREX.pdf.
  47. Řehůřek, R., and Sojka, P. (2010). Software framework for topic modelling with large corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks (pp. 4550). Valletta, Malta: ELRA.
  48. Rocchesso, D., Delle Monache, S., and Barrass, S. (2019). 50 years of the International Journal of Human‑Computer Studies. Reflections on the past, present and future of human‑centred technologies. International Journal of Human‑Computer Studies, 131, 152159.
  49. Russell, J. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39, 11611178.
  50. Schedl, M., Flexer, A., and Urbano, J. (2013). The neglected user in music information retrieval research. Journal of Intelligent Information Systems, 41(3), 523539.
  51. Schulz von Thun, F. (1981). Miteinander Reden 1: Störungen und Klärungen: Allgemeine Psychologie der Kommunikation (48th ed.). Rowohlt Taschenbuch.
  52. Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461464.
  53. Serafin, S., Buxton, B., Gaver, B., and Bly, S. (2022). Auditory Interfaces (1st ed.). Focal Press.
  54. Song, G., Ding, S., and Wang, Z. (2018). Audio classification tasks using recurrent neural network. Submitted to the MIREX Challenge 2018. https://www.music-ir.org/ mirex/abstracts/2018/GS1.pdf.
  55. Tardieu, D., Charbuillet, C., Cornu, F., and Peeters, G. (2011). Mirex‑2011 single‑label and multi‑label classification tasks: Ircamclassification2011 submission. Submitted to the MIREX Challenge 2011. https://www.music-ir.org/mirex/ abstracts/2011/TCCP4.pdf.
  56. Techawachirakul, M., Pathak, A., Motoki, K., and Calvert, G. A. (2023). Influencing brand personality with sonic logos: The role of musical timbre. Journal of Business Research, 168, 114169.
  57. Tzanetakis, G., and Cook, P. (2000). Marsyas: A framework for audio analysis. Organised Sound, 4(3), 169175.
  58. Tzanetakis, G., and Cook, P. (2002). Musical genre classification of audio signals. IEEE Transactions on Speech and Audio Processing, 10(5), 293302.
  59. Virkus, T., Lepa, S., and Frommholz, A. (2025a). Validation of the FBMUX questionnaire for measuring communicative expression of UX sounds on 3 levels. Preprint.
  60. Virkus, T., Lepa, S., and Helberger, J. (2025c). The semantic expression space of UX sounds on the functional level of product communication: An exploratory study with sound designers and consumers. Preprint.
  61. Virkus, T., Lepa, S., Frommholz, A., and Helberger, J. (2025b). The semantic expression space of UX sounds on the brand identity level of product communication. In Mensch und Computer 2025 ‑ Workshopband. Gesellschaft für Informatik e.V.
  62. von Bismarck, G. (1974). Timbre of steady sounds: A factorial investigation of its verbal attributes. Acustica, 30, 146159.
  63. Wang, J.‑C., Yang, Y.‑H., Wang, H.‑M., and Jeng, S.‑K. (2012). The acoustic emotion Gaussians model for emotion‑based music annotation and retrieval. In Proceedings of the 20th ACM International Conference on Multimedia (pp. 8998). New York, NY, USA: Association for Computing Machinery.
  64. Yang, Y.‑H., Lin, Y.‑C., Su, Y.‑F., and Chen, H. H. (2008). A regression approach to music emotion recognition. IEEE Transactions on Audio, Speech, and Language Processing, 16(2), 448457.
DOI: https://doi.org/10.5334/tismir.290 | Journal eISSN: 2514-3298
Language: English
Submitted on: Jun 30, 2025
|
Accepted on: Jan 26, 2026
|
Published on: Mar 2, 2026
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2026 Annika Frommholz, Steffen Lepa, Tom Virkus, Stefan Weinzierl, Johannes Helberger, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.