Have a personal or library account? Click to login
Multi-Objective Investigation of Six Feature Source Types for Multi-Modal Music Classification Cover

Multi-Objective Investigation of Six Feature Source Types for Multi-Modal Music Classification

By: Igor Vatolkin and  Cory McKay  
Open Access
|Jan 2022

References

  1. 1Amaldi, E., and Kann, V. (1998). On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems. Theoretical Computer Science, 209(1–2):237260. DOI: 10.1016/S0304-3975(97)00115-1
  2. 2Audet, C., Bigeon, J., Cartier, D., Digabel, S. L., and Salomon, L. (2021). Performance indicators in multiobjective optimization. European Journal on Operational Research, 292(2):397422. DOI: 10.1016/j.ejor.2020.11.016
  3. 3Bertin-Mahieux, T., Ellis, D. P. W., Whitman, B., and Lamere, P. (2011). The Million Song Dataset. In Proc. of the 12th International Society for Music Information Retrieval Conference, ISMIR, pages 591596.
  4. 4Bischoff, K., Firan, C. S., Paiu, R., Nejdl, W., Laurier, C., and Sordo, M. (2009). Music mood and theme classification – a hybrid approach. In Proc. of the 10th International Society for Music Information Retrieval Conference, ISMIR, pages 657662.
  5. 5Bogdanov, D., Porter, A., Schreiber, H., Urbano, J., and Oramas, S. (2019). The AcousticBrainz Genre Dataset: Multi-source, multi-level, multi-label, and large-scale. In Proc. of the 20th International Society for Music Information Retrieval Conference, ISMIR, pages 360367.
  6. 6Bonnin, G., and Jannach, D. (2014). Automated generation of music playlists: Survey and experiments. ACM Computing Surveys, 47(2):26:126:35. DOI: 10.1145/2652481
  7. 7Breiman, L. (2001). Random forests. Machine Learning Journal, 45(1):532. DOI: 10.1023/A:1010933404324
  8. 8Cataltepe, Z., Yaslan, Y., and Sonmez, A. (2007). Music genre classification using MIDI and audio features. EURASIP Journal of Applied Signal Processing, 2007(1):150150. DOI: 10.1155/2007/36409
  9. 9Celma, Ò. (2010). Music Recommendation and Discovery – The Long Tail, Long Fail, and Long Play in the Digital Music Space. Springer. DOI: 10.1007/978-3-642-13287-2
  10. 10Choi, K., Fazekas, G., Sandler, M. B., and Cho, K. (2017). Transfer learning for music classification and regression tasks. In Proc. of the 18th International Society for Music Information Retrieval Conference, ISMIR, pages 141149.
  11. 11Costa, Y. M. G., Oliveira, L. S., and Silla C. N. Jr., (2017). An evaluation of convolutional neural networks for music classification using spectrograms. Applied Soft Computing, 52:2838. DOI: 10.1016/j.asoc.2016.12.024
  12. 12Csurka, G., Dance, C. R., Fan, L., Willamowski, J., and Bray, C. (2004). Visual categorization with bags of keypoints. In Workshop on Statistical Learning in Computer Vision, ECCV, pages 122.
  13. 13Dannenberg, R. B., Thom, B., and Watson, D. (1997). A machine learning approach to musical style recognition. In Proc. of the International Computer Music Conference, ICMC, pages 344347.
  14. 14Dhanaraj, R., and Logan, B. (2005). Automatic prediction of hit songs. In Proc. of the 6th International Conference on Music Information Retrieval, ISMIR, pages 488491.
  15. 15Doraisamy, S., Golzari, S., Norowi, N. M., Sulaiman, M. N., and Udzir, N. I. (2008). A study on feature selection and classification techniques for automatic genre classification of traditional Malay music. In Bello, J. P., Chew, E., and Turnbull, D., editors, Proc. of the 9th International Conference on Music Information Retrieval, ISMIR, pages 331336.
  16. 16Dunker, P., Nowak, S., Begau, A., and Lanz, C. (2008). Content-based mood classification for photos and music: A generic multi-modal classification framework and evaluation approach. In Proc. of the 1st ACM SIGMM International Conference on Multimedia Information Retrieval, MIR, pages 97104. DOI: 10.1145/1460096.1460114
  17. 17Fiebrink, R., and Fujinaga, I. (2006). Feature selection pitfalls and music classification. In Proceedings of the 7th International Conference on Music Information Retrieval (ISMIR), pages 340341.
  18. 18Fujinaga, I. (1998). Machine recognition of timbre using steady-state tone of acoustic musical instruments. In Proceedings of the International Computer Music Conference (ICMC), pages 207210.
  19. 19Guyon, I., Nikravesh, M., Gunn, S., and Zadeh, L. A., editors (2006). Feature Extraction: Foundations and Applications, volume 207 of Studies in Fuzziness and Soft Computing. Springer, Berlin Heidelberg. DOI: 10.1007/978-3-540-35488-8
  20. 20Harris, Z. S. (1954). Distributional structure. WORD, 10(2–3):146162. DOI: 10.1080/00437956.1954.11659520
  21. 21Hastie, T., Tibshirani, R., and Friedman, J. (2009). The Elements of Statistical Learning. Springer, New York. DOI: 10.1007/978-0-387-84858-7
  22. 22Hu, X., Choi, K., and Downie, J. S. (2017). A framework for evaluating multimodal music mood classification. Journal of the Association for Information Science and Technology, 68(2):273285. DOI: 10.1002/asi.23649
  23. 23Huang, Y., Lin, S., Wu, H., and Li, Y. (2014). Music genre classification based on local feature selection using a self-adaptive harmony search algorithm. Data Knowledge Engineering, 92:6076. DOI: 10.1016/j.datak.2014.07.005
  24. 24Jannach, D., Vatolkin, I., and Bonnin, G. (2017). Music data: Beyond the signal level. In Weihs, C., Jannach, D., Vatolkin, I., and Rudolph, G., editors, Music Data Analysis: Foundations and Applications, pages 197215. CRC Press.
  25. 25Knees, P., and Schedl, M. (2013). A survey of music similarity and recommendation from music context data. ACM Transactions on Multimedia Computing, Communications and Applications, 10(1):2:12:21. DOI: 10.1145/2542205.2542206
  26. 26Kohavi, R., and John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97(1–2):273324. DOI: 10.1016/S0004-3702(97)00043-X
  27. 27Kudo, M., and Sklansky, J. (2000). Comparison of algorithms that select features for pattern classifiers. Pattern Recognition, 33(1):2541. DOI: 10.1016/S0031-3203(99)00041-2
  28. 28Lamere, P. (2008). Social tagging and music information retrieval. Journal of New Music Research, 37(2):101114. DOI: 10.1080/09298210802479284
  29. 29Lartillot, O., and Toiviainen, P. (2007). MIR in Matlab (II): A toolbox for musical feature extraction from audio. In Proc. of the 8th International Conference on Music Information Retrieval, ISMIR, pages 127130.
  30. 30Laurier, C., Grivolla, J., and Herrera, P. (2008). Multimodal music mood classification using audio and lyrics. In Seventh International Conference on Machine Learning and Applications, pages 688693. DOI: 10.1109/ICMLA.2008.96
  31. 31Le, Q., and Mikolov, T. (2014). Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on International Conference on Machine Learning (ICML), volume 32, pages 11881196. JMLR.org.
  32. 32Lim, S.-C., Lee, J.-S., Jang, S.-J., Lee, S.-P., and Kim, M. Y. (2012). Music-genre classification system based on spectro-temporal features and feature selection. IEEE Transactions on Consumer Electronics, 58(4):12621268. DOI: 10.1109/TCE.2012.6414994
  33. 33Logan, B., Kositsky, A., and Moreno, P. (2004). Semantic analysis of song lyrics. In IEEE International Conference on Multimedia and Expo, ICME, volume 2, pages 827830. DOI: 10.1109/ICME.2004.1394328
  34. 34Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91110. DOI: 10.1023/B:VISI.0000029664.99615.94
  35. 35Martí, R., Lozano, J. A., Mendiburu, A., and Hernando, L. (2018). Multi-start methods. In Martí, R., Pardalos, P. M., and Resende, M. G. C., editors, Handbook of Heuristics, pages 155175. Springer International Publishing, Cham. DOI: 10.1007/978-3-319-07124-4_1
  36. 36Martin, R., and Nagathil, A. M. (2009). Cepstral modulation ratio regression (CMRARE) parameters for audio signal analysis and classification. In Proc. of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP, pages 321324. DOI: 10.1109/ICASSP.2009.4959585
  37. 37Mauch, M., and Levy, M. (2011). Structural change on multiple time scales as a correlate of musical complexity. In Klapuri, A. and Leider, C., editors, Proc. of the 12th International Society for Music Information Retrieval Conference, ISMIR, pages 489494.
  38. 38Mayer, R., and Rauber, A. (2010). Multimodal aspects of music retrieval: Audio, song lyrics – and beyond? In Ras, Z. W. and Wieczorkowska, A., editors, Advances in Music Information Retrieval, pages 333363. Springer. DOI: 10.1007/978-3-642-11674-2_15
  39. 39Mayer, R., Rauber, A., de León, P. J. P., Pérez-Sancho, C., and Iñesta, J. M. (2010). Feature selection in a Cartesian ensemble of feature subspace classifiers for music categorisation. In Proc. of the 3rd International Workshop on Machine Learning and Music (MML), pages 5356. ACM. DOI: 10.1145/1878003.1878021
  40. 40McFee, B., and Lanckriet, G. R. G. (2012). Hypergraph models of playlist dialects. In Proc. of the 13th International Society for Music Information Retrieval Conference, ISMIR, pages 343348.
  41. 41McKay, C., Burgoyne, J. A., Hockman, J., Smith, J. B. L., Vigliensoni, G., and Fujinaga, I. (2010). Evaluating the genre classification performance of lyrical features relative to audio, symbolic and cultural features. In Proc. of the 11th International Society for Music Information Retrieval Conference, ISMIR, pages 213218.
  42. 42McKay, C., Cumming, J., and Fujinaga, I. (2018). jSymbolic 2.2: Extracting features from symbolic music for use in musicological and MIR research. In Proc. of the 19th International Society for Music Information Retrieval Conference, ISMIR, pages 348354.
  43. 43McKay, C., and Fujinaga, I. (2006). Musical genre classification: Is it worth pursuing and how can it be improved? In Proc. of the 7th International Conference on Music Information Retrieval, ISMIR, pages 101106.
  44. 44McKay, C., and Fujinaga, I. (2008). Combining features extracted from audio, symbolic and cultural sources. In Proc. of the 9th International Conference on Music Information Retrieval, ISMIR, pages 597602.
  45. 45Meseguer-Brocal, G., Cohen-Hadria, A., and Peeters, G. (2018). DALI: A large dataset of synchronized audio, lyrics and notes, automatically created using teacher-student machine learning paradigm. In Proc. of the 19th International Society for Music Information Retrieval Conference, ISMIR, pages 431437.
  46. 46Müller, M., and Ewert, S. (2011). Chroma Toolbox: MATLAB implementations for extracting variants of chroma-based audio features. In Proc. of the 12th International Society for Music Information Retrieval Conference, ISMIR, pages 215220.
  47. 47Neumayer, R., and Rauber, A. (2007). Integration of text and audio features for genre classification in music information retrieval. In Proc. of the 29th European Conference on IR Research, ECIR, pages 724727. DOI: 10.1007/978-3-540-71496-5_78
  48. 48Oramas, S., Barbieri, F., Nieto, O., and Serra, X. (2018). Multimodal deep learning for music genre classification. Transactions of the International Society for Music Information Retrieval, 1(1):421. DOI: 10.5334/tismir.10
  49. 49Oramas, S., Nieto, O., Barbieri, F., and Serra, X. (2017). Multi-label music genre classification from audio, text and images using deep features. In Proc. of the 18th International Society for Music Information Retrieval Conference, ISMIR, pages 2330.
  50. 50Orio, N., Rizo, D., Miotto, R., Schedl, M., Montecchio, N., and Lartillot, O. (2011). MusiCLEF: A benchmark activity in multimodal music information retrieval. In Proc. of the 12th International Society for Music Information Retrieval Conference, ISMIR, pages 603608.
  51. 51Panda, R., Malheiro, R., Rocha, B., Oliveira, A., and Paiva, R. P. (2013). Multi-modal music emotion recognition: A new dataset, methodology and comparative analysis. In Proc. of the 10th International Symposium on Computer Music Multidisciplinary Research, CMMR. Springer.
  52. 52Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12: 28252830.
  53. 53Raffel, C. (2016). Learning-Based Methods for Comparing Sequences, with Applications to Audio-to-MIDI Alignment and Matching. PhD thesis, Graduate School of Arts and Sciences, Columbia University. DOI: 10.1109/ICASSP.2016.7471641
  54. 54Řehůřek, R., and Sojka, P. (2010). Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pages 4550.
  55. 55Reunanen, J. (2003). Overfitting in making comparisons between variable selection methods. Journal of Machine Learning Research, 3: 13711382.
  56. 56Rötter, G., Vatolkin, I., and Weihs, C. (2013). Computational prediction of high-level descriptors of music personal categories. In Lausen, B., den Poel, D. V., and Ultsch, A., editors, Algorithms from and for Nature and Life – Classification and Data Analysis, Studies in Classification, Data Analysis, and Knowledge Organization, pages 529537. Springer. DOI: 10.1007/978-3-319-00035-0_54
  57. 57Saari, P., Eerola, T., and Lartillot, O. (2011). Generalizability and simplicity as criteria in feature selection: Application to mood classification in music. IEEE Transactions on Audio, Speech, and Language Processing, 19(6):18021812. DOI: 10.1109/TASL.2010.2101596
  58. 58Schindler, A. (2019). Multi-Modal Music Information Retrieval: Augmenting Audio-Analysis with Visual Computing for Improved Music Video Analysis. PhD thesis, Faculty of Informatics, TU Wien.
  59. 59Schreiber, H. (2015). Improving genre annotations for the Million Song Dataset. In Proceedings of the 16th International Society for Music Information Retrieval Conference, ISMIR, pages 241247.
  60. 60Sigtia, S., and Dixon, S. (2014). Improved music feature learning with deep neural networks. In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pages 69596963. DOI: 10.1109/ICASSP.2014.6854949
  61. 61Silla, C. N., Jr., Koerich, A. L., and Kaestner, C. A. A. (2009). A feature selection approach for automatic music genre classification. International Journal of Semantic Computing, 3(2):183208. DOI: 10.1142/S1793351X09000719
  62. 62Simonetta, F., Ntalampiras, S., and Avanzini, F. (2019). Multimodal music information processing and retrieval: Survey and future challenges. In Proc. of the International Workshop on Multilayer Music Representation and Processing, MMRP, pages 1018. DOI: 10.1109/MMRP.2019.00012
  63. 63Sturm, B. L. (2012a). A survey of evaluation in music genre recognition. In 10th International Workshop on Adaptive Multimedia Retrieval:Semantics, Context, and Adaptation, AMR, pages 2966. DOI: 10.1007/978-3-319-12093-5_2
  64. 64Sturm, B. L. (2012b). Two systems for automatic music genre recognition: What are they really recognizing? In Proc. of the 2nd International ACMWorkshop on Music Information Retrieval with User-Centered and Mulitmodal Strategies, MIRUM, pages 6974. DOI: 10.1145/2390848.2390866
  65. 65Sturm, B. L. (2013a). Classification accuracy is not enough. Journal of Intelligent Information Systems, 41(3):371406. DOI: 10.1007/s10844-013-0250-y
  66. 66Sturm, B. L. (2013b). Evaluating music emotion recognition: Lessons from music genre recognition? In IEEE International Conference on Multimedia and Expo Workshops, ICMEW, pages 16. DOI: 10.1109/ICMEW.2013.6618342
  67. 67Tzanetakis, G., and Cook, P. R. (2002). Musical genre classification of audio signals. IEEE Transactions on Speech and Audio Processing, 10(5):293302. DOI: 10.1109/TSA.2002.800560
  68. 68Vatolkin, I. (2015). Exploration of two-objective scenarios on supervised evolutionary feature selection: A survey and a case study (application to music categorisation). In Proc. of the 8th International Conference on Evolutionary Multi-Criterion Optimization, pages 529543. Springer. DOI: 10.1007/978-3-319-15892-1_36
  69. 69Vatolkin, I., Bonnin, G., and Jannach, D. (2014). Comparing audio features and playlist statistics for music classification. In Analysis of Large and Complex Data – Second European Conference on Data Analysis, ECDA, pages 437447. DOI: 10.1007/978-3-319-25226-1_37
  70. 70Vatolkin, I., Preuß, M., and Rudolph, G. (2011). Multiobjective feature selection in music genre and style recognition tasks. In Krasnogor, N. and Lanzi, P. L., editors, Proc. of the 13th Annual Genetic and Evolutionary Computation Conference (GECCO), pages 411418. ACM Press. DOI: 10.1145/2001576.2001633
  71. 71Vatolkin, I., Rudolph, G., and Weihs, C. (2015). Evaluation of album effect for feature selection in music genre recognition. In Proc. of the 16th International Society for Music Information Retrieval Conference, ISMIR, pages 169175.
  72. 72Vatolkin, I., Theimer, W. M., and Botteck, M. (2010). AMUSE (Advanced MUSic Explorer): A multitool framework for music data analysis. In Proc. of the 11th International Society for Music Information Retrieval Conference, ISMIR, pages 3338.
  73. 73Weihs, C., Jannach, D., Vatolkin, I., and Rudolph, G., editors (2017). Music Data Analysis: Foundations and Applications. CRC Press. DOI: 10.1201/9781315370996
  74. 74Wilkes, B. (2019). Analyse von bild-, text- und audiobasierten Merkmalen für die Klassifikation von Musikgenres. Master’s thesis, Department of Computer Science, TU Dortmund.
  75. 75Witten, I. H., Frank, E., Hall, M. A., and Pal, C. J. (2016). Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann Series in Data Management Systems. Morgan Kaufmann, Burlington, Massachusetts.
  76. 76Zangerle, E., Tschuggnall, M., Wurzinger, S., and Specht, G. (2018). Alf-200k: Towards extensive multimodal analyses of music tracks and playlists. In Pasi, G., Piwowarski, B., Azzopardi, L., and Hanbury, A., editors, Advances in Information Retrieval, pages 584590. Springer. DOI: 10.1007/978-3-319-76941-7_48
  77. 77Zitzler, E. (2012). Evolutionary multiobjective optimization. In Rozenberg, G., Bäck, T., and Kok, J. N., editors, Handbook of Natural Computing, Volume 2, pages 871904. Springer, Berlin Heidelberg. DOI: 10.1007/978-3-540-92910-9_28
DOI: https://doi.org/10.5334/tismir.67 | Journal eISSN: 2514-3298
Language: English
Submitted on: Jun 17, 2020
Accepted on: Dec 2, 2021
Published on: Jan 24, 2022
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2022 Igor Vatolkin, Cory McKay, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.