A Bimodal Deep Model to Capture Emotions from Music Tracks

Tobolewski, Jan; Sakowicz, Michał; Turmo, Jordi; Kostek, Bożena

A Bimodal Deep Model to Capture Emotions from Music Tracks

Journal of Artificial Intelligence and Soft Computing Research

Volume 15 (2025): Issue 3 (July 2025)

By:

Jan Tobolewski

, Michał Sakowicz

, Jordi Turmo

and Bożena Kostek

Open Access

|Mar 2025

References

L. Smietanka and T. Maka, “Interpreting convolutional layers in DNN model based on time–frequency representation of emotional speech,” Journal of Artificial Intelligence and Soft Computing Research, vol. 14, no. 1, pp. 5–23, Jan. 2024, doi: 10.2478/jaiscr-2024-0001.
Search in Google Scholar Back to article
S. Sheykhivand, Z. Mousavi, T. Y. Rezaii, and A. Farzamnia, “Recognizing Emotions Evoked by Music Using CNN-LSTM Networks on EEG Signals,” IEEE Access, vol. 8, pp. 139332-139345, 2020, doi: 10.1109/ACCESS.2020.3011882.
Search in Google Scholar Back to article
Y. Takahashi, T. Hochin, and H. Nomiya, “Relationship between Mental States with Strong Emotion Aroused by Music Pieces and Their Feature Values,” in Proc. 2014 IIAI 3rd International Conference on Advanced Applied Informatics, 2014, pp. 718-725, doi: 10.1109/IIAIAAI.2014.147.
Search in Google Scholar Back to article
P. A. Wood and S. K. Semwal, “On exploring the connection between music classification and evoking emotion,” in Proc. 2015 International Conference on Collaboration Technologies and Systems(CTS), 2015, pp. 474-476, doi: 10.1109/CTS.2015.7210471.
Search in Google Scholar Back to article
M. Agapaki, E. A. Pinkerton, and E. Papatzikis, “Music and neuroscience research for mental health, cognition, and development: Ways forward,” Frontiers in Psychology, vol. 13, 2022, doi: https://doi.org/10.3389/fpsyg.2022.976883">https://doi.org/10.3389/fpsyg.2022.976883.
Search in Google Scholar Back to article
Y. Song, S. Dixon, M. Pearce, and A. Halpern, “Perceived and Induced Emotion Responses to Popular Music: Categorical and Dimensional Models,” Music Perception: An Interdisciplinary Journal, vol. 33, pp. 472-492, Apr. 2016, doi: 10.1525/mp.2016.33.4.472.
Search in Google Scholar Back to article
Y. Yuan, “Emotion of Music: Extraction and Composing,” Journal of Education, Humanities and Social Sciences, vol. 13, pp. 422-428, May 2023, doi: 10.54097/ehss.v13i.8207.
Search in Google Scholar Back to article
S. A. Sujeesha, J. B. Mala, and R. Rajeev, “Automatic music mood classification using multi-modal attention framework,” *Engineering Applications of Artificial Intelligence*, vol. 128, p. 107355, 2024, doi: 10.1016/j.engappai.2023.107355.
Search in Google Scholar Back to article
M. Schedl, P. Knees, B. McFee, D. Bogdanov, and M. Kaminskas, “Music recommender systems,” in Recommender systems handbook, Springer, 2015, pp. 453-492.
Search in Google Scholar Back to article
MorphCast Technology. Available: https://www.morphcast.com. Accessed: November 2024.
Search in Google Scholar Back to article
S. Zhao, G. Jia, J. Yang, G. Ding, and K. Keutzer, “Emotion Recognition From Multiple Modalities: Fundamentals and methodologies,” IEEE Signal Processing Magazine, vol. 38, no. 6, pp. 59-73, Nov. 2021, doi: 10.1109/msp.2021.3106895.
Search in Google Scholar Back to article
T. Li, “Music emotion recognition using deep convolutional neural networks,” Journal of Computational Methods in Science and Engineering, vol. 24, no. 4-5, pp. 3063-3078, 2024, doi: 10.3233/JCM-247551.
Search in Google Scholar Back to article
P. L. Louro, H. Redinho, R. Malheiro, R. P. Paiva, and R. Panda, “A comparison study of deep learning methodologies for music emotion recognition,” Sensors, vol. 24, no. 7, p. 2201, 2024, doi: 10.3390/s24072201.
Search in Google Scholar Back to article
M. Blaszke, G. Korvel, and B. Kostek, “Exploring neural networks for musical instrument identification in polyphonic audio,” IEEE Intelligent Systems, pp. 1-11, 2024, doi: 10.1109/mis.2024.3392586.
Search in Google Scholar Back to article
M. Barata and P. Coelho, “Music Streaming Services: Understanding the drivers of customer purchase and intention to recommend,” Heliyon, vol. 7, p. e07783, Aug. 2021, doi: 10.1016/j.heliyon.2021.e07783.
Search in Google Scholar Back to article
J. Webster, “The promise of personalization: Exploring how music streaming platforms are shaping the performance of class identities and distinction,” New Media & Society, p. 146144482110278, Jul. 2021, doi: 10.1177/14614448211027863.
Search in Google Scholar Back to article
E. Schmidt, D. Turnbull, and Y. Kim, “Feature selection for content-based, time-varying musical emotion regression,” in Proc ACM SIGMM Int Conf Multimedia Info Retrieval, Mar. 2010, pp. 267-274, doi: 10.1145/1743384.1743431.
Search in Google Scholar Back to article
Y.-H. Yang, Y.-C. Lin, H.-T. Cheng, I.-B. Liao, Y.-C. Ho, and H. H. Chen, “Toward Multimodal Music Emotion Classification,” in Advances in Multimedia Information Processing - PCM 2008, 2008, pp. 70-79.
Search in Google Scholar Back to article
T. Ciborowski, S. Reginis, D. Weber, A. Kurowski, and B. Kostek, “Classifying Emotions in Film Music—A Deep Learning Approach,” Electronics, vol. 10, no. 23, p. 2955, Nov. 2021, doi: 10.3390/electronics10232955.
Search in Google Scholar Back to article
X. Han, F. Chen, and J. Ban, “Music Emotion Recognition Based on a Neural Network with an Inception-GRU Residual Structure,” Electronics, vol. 12, no. 4, p. 978, Feb. 2023, doi: 10.3390/electronics12040978.
Search in Google Scholar Back to article
Y. J. Liao, W. C. Wang, S.-J. Ruan, Y. H. Lee, and S. C. Chen, “A Music Playback Algorithm Based on Residual-Inception Blocks for Music Emotion Classification and Physiological Information,” Sensors, vol. 22, no. 3, p. 777, Jan. 2022, doi: 10.3390/s22030777.
Search in Google Scholar Back to article
R. Sarkar, S. Choudhury, S. Dutta, A. Roy, and S. K. Saha, “Recognition of emotion in music based on deep convolutional neural network,” Multimedia Tools and Applications, vol. 79, pp. 765-783, 2019, [Online]. Available: https://api.semanticscholar.org/CorpusID:254866914.
Search in Google Scholar Back to article
S. Giammusso, M. Guerriero, P. Lisena, E. Palumbo, and R. Troncy, “Predicting the emotion of playlists using track lyrics,” International Society for Music Information Retrieval ISMIR, Late Breaking Session, 2017.
Search in Google Scholar Back to article
Y. Agrawal, R. Shanker, and V. Alluri, “Transformer-based approach towards music emotion recognition from lyrics,” Advances in Information Retrieval. ECIR 2021. Lecture Notes in Computer Science, vol 12657. Springer, 2021, doi: 10.1007/978-3-030-72240-1 12.
Search in Google Scholar Back to article
D. Han, Y. Kong, H. Jiayi, and G. Wang, “A survey of music emotion recognition,” Frontiers of Computer Science, vol. 16, Dec. 2022, doi: 10.1007/s11704-021-0569-4.
Search in Google Scholar Back to article
T. Baltrušaitis, C. Ahuja, and L. -P. Morency, “Multimodal Machine Learning: A Survey and Taxonomy,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 2, pp. 423-443, 1 Feb. 2019, doi: 10.1109/TPAMI.2018.2798607.
Search in Google Scholar Back to article
R. Delbouys, R. Hennequin, F. Piccoli, J. Royo-Letelier, and M. Moussallam, “Music Mood Detection Based On Audio And Lyrics With Deep Neural Net,” ISMIR 2018 https://doi.org/10.48550/arXiv.1809.07276">https://doi.org/10.48550/arXiv.1809.07276
Search in Google Scholar Back to article
I. A. P. Santana et al., “Music4all: A new music database and its applications,” in Proc. 2020 International Conference on Systems, Signals and Image Processing (IWSSIP), 2020, pp. 399-404, doi: 10.1109/IWSSIP48289.2020.9145170.
Search in Google Scholar Back to article
E. Çano and M. Morisio, “Moodylyrics: A sentiment annotated lyrics dataset,” in Proc. 2017 International conference on intelligent systems, meta-heuristics & swarm intelligence, 2017, pp. 118-124, doi: 10.1145/3059336.3059340.
Search in Google Scholar Back to article
E. Çano and M. Morisio, “Music mood dataset creation based on last.Fm tags,” in Proc. 2017 International Conference on Artificial Intelligence and Applications, Vienna, Austria, 2017, pp. 15-26, DOI:10.5121/csit.2017.70603.
Search in Google Scholar Back to article
R.E. Thayer: The Biopsychology of Mood and Arousal, Oxford University Press, 1989.
Search in Google Scholar Back to article
J. Russell, “A Circumplex Model of Affect,” Journal of Personality and Social Psychology, vol. 39, pp. 1161-1178, Dec. 1980, doi: 10.1037/h0077714.
Search in Google Scholar Back to article
Social music service - Last.fm. Available: https://www.last.fm/. Accessed: November 2024.
Search in Google Scholar Back to article
Genius - Song Lyrics & Knowledge. Available: https://genius.com/. Accessed: November 2024.
Search in Google Scholar Back to article
YouTube. Available: https://www.youtube.com. Accessed: November 2024.
Search in Google Scholar Back to article
M. Sakowicz and J. Tobolewski, “Development and study of an algorithm for the automatic labeling of musical pieces in the context of emotion evoked,” M.Sc. thesis, Gdansk University of Technology and Universitat Politècnica de Catalunya (co-supervised by B. Kostek and J. Turmo), 2023.
Search in Google Scholar Back to article
Genius and Spotify partnering. Available: https://genius.com/a/genius-and-spotify-together. Accessed: November 2024.
Search in Google Scholar Back to article
Pafy library. Available: https://pypi.org/project/pafy/. Accessed: November 2024.
Search in Google Scholar Back to article
Moviepy library. Available: https://pypi.org/project/moviepy/. Accessed: November 2024.
Search in Google Scholar Back to article
M. Honnibal and I. Montani, “spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing,” 2017. Available: https://github.com/explosion/spaCy. Accessed: November 2024.
Search in Google Scholar Back to article
P. N. Johnson-Laird and K. Oatley, “Emotions, Simulation, and Abstract Art,” Art & Perception, vol. 9, no. 3, pp. 260-292, 2021, DOI: https://doi.org/10.1163/22134913-bja10029">https://doi.org/10.1163/22134913-bja10029.
Search in Google Scholar Back to article
P. N. Johnson-Laird and K. Oatley, “How poetry evokes emotions,” Acta Psycho-logica, vol. 224, p. 103506, 2022, doi: https://doi.org/10.1016/j.actpsy.2022.103506">https://doi.org/10.1016/j.actpsy.2022.103506.
Search in Google Scholar Back to article
J. Pennington, R. Socher, and C. Manning, “GloVe: Global Vectors for Word Representation,” in Proc. 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Oct. 2014, pp. 1532-1543, doi: 10.3115/v1/D14-1162.
Search in Google Scholar Back to article
SpaCy - pre-trained pipeline for English. Available: https://spacy.io/models/en\#en_core_web_lg. Accessed: November 2024.
Search in Google Scholar Back to article
S. Loria, “Textblob Documentation,” Release 0.15, vol. 2, 2018. Available: https://textblob.readthedocs.io/en/dev/. Accessed: November 2024.
Search in Google Scholar Back to article
F. Pedregosa et al., “Scikit-learn: Machine Learning in Python,” Journal of Machine Learning Research, vol. 12, no. 85, pp. 2825-2830, 2011. Available: http://jmlr.org/papers/v12/pedregosa11a.html. Accessed: November 2024.
Search in Google Scholar Back to article
”Paradise City” Guns N’ Roses https://genius.com/Guns-n-roses-paradise-citylyrics
Search in Google Scholar Back to article
FastText - text classification tutorial. Available: https://fasttext.cc/docs/en/supervisedtutorial.html. Accessed: November 2024.
Search in Google Scholar Back to article
T. Wolf et al., “Transformers: State-of-the-Art Natural Language Processing,” Jan. 2020, pp. 38-45, doi: 10.18653/v1/2020.emnlp-demos.6.
Search in Google Scholar Back to article
XLNet (base-sized model). Available: https://huggingface.co/xlnet-base-cased. Accessed: November 2024.
Search in Google Scholar Back to article
Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le, “Xlnet: Generalized autoregressive pretraining for language understanding,” Advances in neural information processing systems, vol. 32, 2019. https://doi.org/10.48550/arXiv.1906.08237">https://doi.org/10.48550/arXiv.1906.08237
Search in Google Scholar Back to article
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the Inception Architecture for Computer Vision,” in Proc. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2016, pp. 2818-2826 doi: 10.1109/CVPR.2016.308.
Search in Google Scholar Back to article
K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in Proc. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 770-778, doi: 10.1109/CVPR.2016.90.
Search in Google Scholar Back to article
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. 3rd International Conference on Learning Representations(ICLR 2015), 2015, pp. 1-14. https://doi.org/10.48550/arXiv.1409.1556">https://doi.org/10.48550/arXiv.1409.1556
Search in Google Scholar Back to article
Librosa library. Available: https://librosa.org/. Accessed: November 2024.
Search in Google Scholar Back to article
Chollet, F. et al., 2015. Keras. Available: https://github.com/fchollet/keras. Accessed: November 2024.
Search in Google Scholar Back to article
TensorFlow library. Available: https://www.tensorflow.org/?hl=pl. Accessed: November 2024.
Search in Google Scholar Back to article
S. C. Huang, A. Pareek, S. Seyyedi, I. Banerjee, and M. Lungren, “Fusion of medical imaging and electronic health records using deep learning: a systematic review and implementation guidelines,” npj Digital Medicine, vol. 3, 12, 2020. https://doi.org/10.1038/s41746-020-00341-z">https://doi.org/10.1038/s41746-020-00341-z
Search in Google Scholar Back to article
A. Paszke et al., 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems, 32. Curran Associates, Inc., pp. 8024-8035.
Search in Google Scholar Back to article
Combining two deep learning models. Available: https://control.com/technical-articles/combining-two-deep-learning-models/. Accessed: November 2024.
Search in Google Scholar Back to article
Y. Shi, A. Karatzoglou, L. Baltrunas, M. Larson, A.Hanjalic, and N. Oliver, “TFMAP: Optimizing MAP for top-n context-aware recommendation,” in Proc. 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 155-164, Portland Oregon USA, August 2012, doi: 10.1145/2348283.2348308.
Search in Google Scholar Back to article
K. Pyrovolakis, P.K. Tzouveli, and G. Stamou, Multi-Modal Song Mood Detection with Deep Learning. Sensors (Basel, Switzerland), 22, 2022, doi:10.3390/s22031065
Search in Google Scholar Back to article
E. N. Shaday, V. J. L. Engel, and H. Heryanto, “Application of the Bidirectional Long Short-Term Memory Method with Comparison of Word2Vec, GloVe, and FastText for Emotion Classification in Song Lyrics”, Procedia Computer Science, vol. 245, pp. 137-146, 2024, https://doi.org/10.1016/j.procs.2024.10.237">https://doi.org/10.1016/j.procs.2024.10.237
Search in Google Scholar Back to article

DOI: https://doi.org/10.2478/jaiscr-2025-0011

Journal RSS Feed

Language: English

Page range: 215 - 235

Submitted on: Nov 28, 2024

Accepted on: Feb 21, 2025

Published on: Mar 18, 2025

Published by: SAN University

In partnership with: Paradigm Publishing Services

Publication frequency: 4 times per year

Keywords:

automatic labeling,

deep model,

emotion,

music,

lyrics,

machine learning

Related subjects:

Computer sciences,

Databases and data mining,

Artificial intelligence

© 2025 Jan Tobolewski, Michał Sakowicz, Jordi Turmo, Bożena Kostek, published by SAN University
This work is licensed under the Creative Commons Attribution 4.0 License.

Volume 15 (2025): Issue 3 (July 2025)Next article