Abstract
Functional sounds—typically brief, nonverbal audio cues used in the interfaces of electronic devices—play a critical role in human–machine interaction but remain largely unexplored within music information retrieval (MIR). This study proposes a data-driven framework that uses musically informed audio features to predict the perceived semantic expression of functional sounds. Our three-stage pipeline first uses unsupervised feature extraction to transform 805 functional sounds into high-level topic distributions for timbre, chroma, and loudness using Gaussian mixture models and latent Dirichlet allocation. Second, these features train multi-output regression models to predict 19 perceptual dimensions from the FBMUX framework, with a random forest regressor achieving the best performance. Finally, a listening experiment assesses how well the model predictions align with user perceptions. Interpretability analyses further reveal how individual features contribute to model predictions. This work contributes to MIR by expanding its scope to the domain of functional, non-musical audio. It presents a novel application of MIR techniques, demonstrating that structured, musically informed descriptors can support perceptual modeling in domains with limited data and high subjective variance. It contributes a transferable approach and highlights the potential of MIR to inform human–machine interaction and sound design.
