Instrument Classification in Musical Audio Signals using Deep Learning

Borovčak, Karlo; Babac, Marina Bagić

doi:10.2478/crdj-2025-0006

Abstract

The intersection of artificial intelligence and music technology is creating new possibilities for cultural preservation and innovation. This study aims to utilise this technology by optimising deep learning models for accurate instrument classification, thereby contributing to advancements in music recognition, database organisation, and educational transcription tasks. Using the IRMAS dataset, we evaluated several neural network architectures, including DenseNet121, ResNet-50, and ConvNeXt, trained on log-Mel spectrograms of segmented audio clips to capture the unique acoustic features of each instrument. Results indicate that DenseNet121 achieved the highest classification accuracy, with notable performance in precision, recall, and F1-score compared to other models. However, challenges were observed in recognising instruments with fewer training samples, like the clarinet and cello, underscoring the importance of balanced datasets. While data augmentation techniques only partially addressed class imbalance, the findings offer valuable insights into designing robust music processing systems, highlighting areas for improvement in feature extraction and data handling. This study contributes to the development of AI-driven tools in music, offering potential benefits for cultural and educational growth.

References

Agostini, G., Longari, M., & Pollastri, E. (2003). Musical Instrument Timbres Classification with Spectral Features. EURASIP Journal on Advances in Signal Processing, 2003(1), 943279. https://doi.org/10.1155/S1110865703210118
Search in Google Scholar Back to article
Borotić, G., Granoša, L., Kovačević, J. & Bagić Babac, M. (2023), Effective Spam Detection with Machine Learning, Croatian Regional Development Journal, 3(2), 43-64. https://doi.org/10.2478/crdj-2023-0007
Search in Google Scholar Back to article
Chakraborty, S. S., & Parekh, R. (2018). Improved Musical Instrument Classification Using Cepstral Coefficients and Neural Networks. In J. K. Mandal, S. Mukhopadhyay, P. Dutta, & K. Dasgupta (Eds.), Methodologies and Application Issues of Contemporary Computing Framework (pp. 123–138). Springer Singapore. https://doi.org/10.1007/978-981-13-2345-4_10
Search in Google Scholar Back to article
Deng, J. D., Simmermacher, C., & Cranefield, S. (2008). A Study on Feature Analysis for Musical Instrument Classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 38(2), 429–438. https://doi.org/10.1109/TSMCB.2007.913394
Search in Google Scholar Back to article
Gómez-Cañón, J., Abeßer, J., & Cano, E. (2018, July). Jazz Solo Instrument Classification with Convolutional Neural Networks, Source Separation, and Transfer Learning.
Search in Google Scholar Back to article
Gururani, S., Sharma, M., & Lerch, A. (2019). An Attention Mechanism for Musical Instrument Recognition. https://doi.org/10.48550/ARXIV.1907.04294
Search in Google Scholar Back to article
Han, Y., Kim, J., & Lee, K. (2017). Deep Convolutional Neural Networks for Predominant Instrument Recognition in Polyphonic Music. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(1), 208–221. https://doi.org/10.1109/TASLP.2016.2632307
Search in Google Scholar Back to article
Hernandez-Olivan, C., & Beltran, J. R. (2021). Timbre Classification of Musical Instruments with a Deep Learning Multi-Head Attention-Based Model. https://doi.org/10.48550/ARXIV.2107.06231
Search in Google Scholar Back to article
Joder, C., Essid, S., & Richard, G. (2009). Temporal Integration for Audio Classification With Application to Musical Instrument Classification. IEEE Transactions on Audio, Speech, and Language Processing, 17(1), 174–186. https://doi.org/10.1109/TASL.2008.2007613
Search in Google Scholar Back to article
Khan, M. K. S., & Al-Khatib, W. G. (2006). Machine-learning based classification of speech and music. Multimedia Systems, 12(1), 55–67. https://doi.org/10.1007/s00530-006-0034-0
Search in Google Scholar Back to article
Kratimenos, A., Avramidis, K., Garoufis, C., Zlatintsi, A., & Maragos, P. (2021). Augmentation Methods on Monophonic Audio for Instrument Classification in Polyphonic Music. 2020 28th European Signal Processing Conference (EUSIPCO), 156–160. https://doi.org/10.23919/Eusipco47968.2020.9287745
Search in Google Scholar Back to article
Ivezić, D. & Bagić Babac, M. (2023). Trends and Challenges of Text-to-Image Generation: Sustainability Perspective, Croatian Regional Development Journal, 3(1), 56-77. https://hrcak.srce.hr/file/448285
Search in Google Scholar Back to article
Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., & Xie, S. (2022). A ConvNet for the 2020s. https://doi.org/10.48550/ARXIV.2201.03545
Search in Google Scholar Back to article
Mahanta, S. K., Rahman Khilji, A. F. U., & Pakray, P. (2021). Deep Neural Network for Musical Instrument Recognition Using MFCCs. Computación y Sistemas, 25(2). https://doi.org/10.13053/cys-25-2-3946
Search in Google Scholar Back to article
Park, T., & Lee, T. (2015). Musical instrument sound classification with deep convolutional neural network using feature fusion approach. https://doi.org/10.48550/ARXIV.1512.07370
Search in Google Scholar Back to article
Poje, K., Brčić, M., Kovač, M., & Bagić Babac, M. (2024), Effect of Private Deliberation: Deception of Large Language Models in Game Play. Entropy, 26(6), 524. https://doi.org/10.3390/e26060524
Search in Google Scholar Back to article
Profeta, R., & Schuller, G. (2021). End-to-End Learning for Musical Instruments Classification. 2021 55th Asilomar Conference on Signals, Systems, and Computers, 1607–1611. https://doi.org/10.1109/IEEECONF53345.2021.9723181
Search in Google Scholar Back to article
Puh, K., & Bagić Babac, M. (2023). Predicting stock market using natural language processing, American Journal of Business, 38(2), 41-61. https://www.emerald.com/insight/content/doi/10.1108/AJB-08-2022-0124/full/html,
Search in Google Scholar Back to article
Racharla, K., Kumar, V., Jayant, C. B., Khairkar, A., & Harish, P. (2020). Predominant Musical Instrument Classification based on Spectral Features. 2020 7th International Conference on Signal Processing and Integrated Networks (SPIN), 617–622. https://doi.org/10.1109/SPIN48934.2020.9071125
Search in Google Scholar Back to article
Rajesh, S., & Nalini, N. J. (2020). Musical instrument emotion recognition using deep recurrent neural network. Procedia Computer Science, 167, 16–25. https://doi.org/10.1016/j.procs.2020.03.178
Search in Google Scholar Back to article
Šimić, A. & Bagić Babac, M. (2024). Artificial Intelligence in Classifying and Creating Art: a Survey International Journal of Student Project Reporting, 2(1), 59 - 89.
Search in Google Scholar Back to article
Taenzer, M., Mimilakis, S. I., & Abeßer, J. (2023). Deep Learning-Based Music Instrument Recognition: Exploring Learned Feature Representations. In M. Aramaki, K. Hirata, T. Kitahara, R. Kronland-Martinet, & S. Ystad (Eds.), Music in the AI Era (Vol. 13770, pp. 32–46). Springer International Publishing. https://doi.org/10.1007/978-3-031-35382-6_4
Search in Google Scholar Back to article
Targ, S., Almeida, D., & Lyman, K. (2016). Resnet in Resnet: Generalizing Residual Architectures. https://doi.org/10.48550/ARXIV.1603.08029
Search in Google Scholar Back to article
Umapathy, K., Krishnan, S., & Rao, R. K. (2007). Audio Signal Feature Extraction and Classification Using Local Discriminant Bases. IEEE Transactions on Audio, Speech and Language Processing, 15(4), 1236–1246. https://doi.org/10.1109/TASL.2006.885921
Search in Google Scholar Back to article

Instrument Classification in Musical Audio Signals using Deep Learning

Abstract

Paradigm

My account