Have a personal or library account? Click to login
Nuanced Music Emotion Recognition via a Semi‑Supervised Multi‑Relational Graph Neural Network Cover

Nuanced Music Emotion Recognition via a Semi‑Supervised Multi‑Relational Graph Neural Network

Open Access
|Jun 2025

Figures & Tables

tismir-8-1-235-g1.png
Figure 1

The Geneva Emotion Music Scale with nine dimensions based on the factor analysis in the work of Zentner et al. (2008).

tismir-8-1-235-g2.png
Figure 2

Illustration of SRGNN‑Emo which constructs a multi‑relational graph with nodes representing tracks and edges symbolizing connections based on sessions, genres, or user tags shared among tracks. We used stochastic graph augmentations to generate two distinct graph views, which were processed by a shared encoder to ensure robust and invariant node representations in a self‑supervised manner. Emotion‑guided consistency objective () optimization aimed to align unlabeled nodes with emotion profile patterns of labeled nodes across augmented graph views. The learned node representations were then fed into a multi‑layer perceptron regressor to predict the emotion profile of each track.

Table 1

Multi‑target regression performance for different models across three representation types. The best results are in boldface and the second‑best results are underlined. All improvements of SRGNN‑Emo compared to the second‑best performing model are significant (Wilcoxon signed‑rank test, ). Models marked with do not use any underlying track representation.

Rep.musicnnMAESTJukebox
ModelRMSE (SE) (SE)RMSE (SE) (SE)RMSE (SE) (SE)
LR0.8443 (0.02)0.2470 (0.05)1.3821 (0.06)‑1.0731 (0.22)1.0301 (0.04)‑0.1403 (0.09)
SVR0.8188 (0.01)0.2968 (0.01)0.7862 (0.01)0.3504 (0.02)0.9802 (0.02)0.0163 (0.01)
COREG0.8742 (0.02)0.1140 (0.05)0.8613 (0.02)0.1346 (0.08)0.8680 (0.02)0.1244 (0.05)
MLP0.8132 (0.02)0.3106 (0.02)0.8938 (0.03)0.1576 (0.08)0.8579 (0.02)0.2193 (0.06)
LP0.9488 (0.03)0.0806 (0.01)0.9488 (0.03)0.0806 (0.01)0.9488 (0.03)0.0806 (0.01)
GCN0.8071 (0.02)0.3158 (0.04)0.7781 (0.02)0.3568 (0.05)0.7492 (0.04)0.4039 (0.05)
GAT0.8167 (0.03)0.2992 (0.07)0.7856 (0.02)0.3476 (0.05)0.7567 (0.02)0.3926 (0.03)
DGI0.8042 (0.02)0.3184 (0.06) (0.01)0.3644 (0.06) (0.02) (0.04)
BGRL (0.02) (0.05)0.7939 (0.02)0.3370 (0.07)0.7905 (0.02)0.3843 (0.05)
MRLGCN0.8592 (0.04)0.2600 (0.04)0.7868 (0.03) (0.05)0.7932 (0.03)0.3651 (0.05)
DOMR+0.8291 (0.03)0.2777 (0.08)0.8291 (0.03)0.2777 (0.08)0.8291 (0.03)0.2777 (0.08)
SRGNN‑Emo0.7973 (0.03)0.3305 (0.06)0.7707 (0.01)0.3724 (0.05)0.7411 (0.02)0.4180 (0.04)
Table 2

RMSE scores of models (using the best‑performing representations from Table 1) across multiple emotion targets. Abbreviations of emotion dimensions correspond to wonder, transcendence, tenderness, nostalgia, peacefulness, joyful activation, power, sadness, and tension. All improvements of the best‑performing models (boldface) are statistically significant compared to the second‑best models (underline) per emotion dimension (Wilcoxon signed‑rank test, ).

ModelwondtrantendnostpeacejoyapowersadntensGEMS‑9
MLP (musicnn)0.93120.96530.73300.89360.64660.80990.80070.77110.76750.8132
DGI (Jukebox)0.90590.94250.66470.80940.71620.75110.66270.65690.7464
SRGNN‑Emo (Jukebox)0.89720.93450.65180.80260.61620.69300.74250.7411
(A) w/o 0.91770.93840.65320.81920.60860.70500.76530.67130.68290.7513
(B) w/o 0.90410.93870.66360.82450.61100.70820.76500.68450.67790.7530
(C) w/o 1.23721.09961.24541.22101.35431.33291.24241.29071.23391.2508
tismir-8-1-235-g3.png
Figure 3

Performance impact of different number of layers in our wR‑GCN component.

tismir-8-1-235-g4.png
Figure 4

Performance impact of different number of emotion profile clusters .

tismir-8-1-235-g5.png
Figure 5

Model performances on different fractions of training data using Jukebox representations.

DOI: https://doi.org/10.5334/tismir.235 | Journal eISSN: 2514-3298
Language: English
Submitted on: Oct 29, 2024
Accepted on: Apr 30, 2025
Published on: Jun 11, 2025
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2025 Andreas Peintner, Marta Moscati, Yu Kinoshita, Richard Vogl, Peter Knees, Markus Schedl, Hannah Strauss, Marcel Zentner, Eva Zangerle, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.