Multi-Objective Investigation of Six Feature Source Types for Multi-Modal Music Classification

Igor Vatolkin; Cory McKay

doi:10.5334/tismir.67

References

1Amaldi, E., and Kann, V. (1998). On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems. Theoretical Computer Science, 209(1–2):237–260. DOI: 10.1016/S0304-3975(97)00115-1
Back to article
2Audet, C., Bigeon, J., Cartier, D., Digabel, S. L., and Salomon, L. (2021). Performance indicators in multiobjective optimization. European Journal on Operational Research, 292(2):397–422. DOI: 10.1016/j.ejor.2020.11.016
Back to article
3Bertin-Mahieux, T., Ellis, D. P. W., Whitman, B., and Lamere, P. (2011). The Million Song Dataset. In Proc. of the 12th International Society for Music Information Retrieval Conference, ISMIR, pages 591–596.
Back to article
4Bischoff, K., Firan, C. S., Paiu, R., Nejdl, W., Laurier, C., and Sordo, M. (2009). Music mood and theme classification – a hybrid approach. In Proc. of the 10th International Society for Music Information Retrieval Conference, ISMIR, pages 657–662.
Back to article
5Bogdanov, D., Porter, A., Schreiber, H., Urbano, J., and Oramas, S. (2019). The AcousticBrainz Genre Dataset: Multi-source, multi-level, multi-label, and large-scale. In Proc. of the 20th International Society for Music Information Retrieval Conference, ISMIR, pages 360–367.
Back to article
6Bonnin, G., and Jannach, D. (2014). Automated generation of music playlists: Survey and experiments. ACM Computing Surveys, 47(2):26:1–26:35. DOI: 10.1145/2652481
Back to article
7Breiman, L. (2001). Random forests. Machine Learning Journal, 45(1):5–32. DOI: 10.1023/A:1010933404324
Back to article
8Cataltepe, Z., Yaslan, Y., and Sonmez, A. (2007). Music genre classification using MIDI and audio features. EURASIP Journal of Applied Signal Processing, 2007(1):150–150. DOI: 10.1155/2007/36409
Back to article
9Celma, Ò. (2010). Music Recommendation and Discovery – The Long Tail, Long Fail, and Long Play in the Digital Music Space. Springer. DOI: 10.1007/978-3-642-13287-2
Back to article
10Choi, K., Fazekas, G., Sandler, M. B., and Cho, K. (2017). Transfer learning for music classification and regression tasks. In Proc. of the 18th International Society for Music Information Retrieval Conference, ISMIR, pages 141–149.
Back to article
11Costa, Y. M. G., Oliveira, L. S., and Silla C. N. Jr., (2017). An evaluation of convolutional neural networks for music classification using spectrograms. Applied Soft Computing, 52:28–38. DOI: 10.1016/j.asoc.2016.12.024
Back to article
12Csurka, G., Dance, C. R., Fan, L., Willamowski, J., and Bray, C. (2004). Visual categorization with bags of keypoints. In Workshop on Statistical Learning in Computer Vision, ECCV, pages 1–22.
Back to article
13Dannenberg, R. B., Thom, B., and Watson, D. (1997). A machine learning approach to musical style recognition. In Proc. of the International Computer Music Conference, ICMC, pages 344–347.
Back to article
14Dhanaraj, R., and Logan, B. (2005). Automatic prediction of hit songs. In Proc. of the 6th International Conference on Music Information Retrieval, ISMIR, pages 488–491.
Back to article
15Doraisamy, S., Golzari, S., Norowi, N. M., Sulaiman, M. N., and Udzir, N. I. (2008). A study on feature selection and classification techniques for automatic genre classification of traditional Malay music. In Bello, J. P., Chew, E., and Turnbull, D., editors, Proc. of the 9th International Conference on Music Information Retrieval, ISMIR, pages 331–336.
Back to article
16Dunker, P., Nowak, S., Begau, A., and Lanz, C. (2008). Content-based mood classification for photos and music: A generic multi-modal classification framework and evaluation approach. In Proc. of the 1st ACM SIGMM International Conference on Multimedia Information Retrieval, MIR, pages 97–104. DOI: 10.1145/1460096.1460114
Back to article
17Fiebrink, R., and Fujinaga, I. (2006). Feature selection pitfalls and music classification. In Proceedings of the 7th International Conference on Music Information Retrieval (ISMIR), pages 340–341.
Back to article
18Fujinaga, I. (1998). Machine recognition of timbre using steady-state tone of acoustic musical instruments. In Proceedings of the International Computer Music Conference (ICMC), pages 207–210.
Back to article
19Guyon, I., Nikravesh, M., Gunn, S., and Zadeh, L. A., editors (2006). Feature Extraction: Foundations and Applications, volume 207 of Studies in Fuzziness and Soft Computing. Springer, Berlin Heidelberg. DOI: 10.1007/978-3-540-35488-8
Back to article
20Harris, Z. S. (1954). Distributional structure. WORD, 10(2–3):146–162. DOI: 10.1080/00437956.1954.11659520
Back to article
21Hastie, T., Tibshirani, R., and Friedman, J. (2009). The Elements of Statistical Learning. Springer, New York. DOI: 10.1007/978-0-387-84858-7
Back to article
22Hu, X., Choi, K., and Downie, J. S. (2017). A framework for evaluating multimodal music mood classification. Journal of the Association for Information Science and Technology, 68(2):273–285. DOI: 10.1002/asi.23649
Back to article
23Huang, Y., Lin, S., Wu, H., and Li, Y. (2014). Music genre classification based on local feature selection using a self-adaptive harmony search algorithm. Data Knowledge Engineering, 92:60–76. DOI: 10.1016/j.datak.2014.07.005
Back to article
24Jannach, D., Vatolkin, I., and Bonnin, G. (2017). Music data: Beyond the signal level. In Weihs, C., Jannach, D., Vatolkin, I., and Rudolph, G., editors, Music Data Analysis: Foundations and Applications, pages 197–215. CRC Press.
Back to article
25Knees, P., and Schedl, M. (2013). A survey of music similarity and recommendation from music context data. ACM Transactions on Multimedia Computing, Communications and Applications, 10(1):2:1–2:21. DOI: 10.1145/2542205.2542206
Back to article
26Kohavi, R., and John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97(1–2):273–324. DOI: 10.1016/S0004-3702(97)00043-X
Back to article
27Kudo, M., and Sklansky, J. (2000). Comparison of algorithms that select features for pattern classifiers. Pattern Recognition, 33(1):25–41. DOI: 10.1016/S0031-3203(99)00041-2
Back to article
28Lamere, P. (2008). Social tagging and music information retrieval. Journal of New Music Research, 37(2):101–114. DOI: 10.1080/09298210802479284
Back to article
29Lartillot, O., and Toiviainen, P. (2007). MIR in Matlab (II): A toolbox for musical feature extraction from audio. In Proc. of the 8th International Conference on Music Information Retrieval, ISMIR, pages 127–130.
Back to article
30Laurier, C., Grivolla, J., and Herrera, P. (2008). Multimodal music mood classification using audio and lyrics. In Seventh International Conference on Machine Learning and Applications, pages 688–693. DOI: 10.1109/ICMLA.2008.96
Back to article
31Le, Q., and Mikolov, T. (2014). Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on International Conference on Machine Learning (ICML), volume 32, pages 1188–1196. JMLR.org.
Back to article
32Lim, S.-C., Lee, J.-S., Jang, S.-J., Lee, S.-P., and Kim, M. Y. (2012). Music-genre classification system based on spectro-temporal features and feature selection. IEEE Transactions on Consumer Electronics, 58(4):1262–1268. DOI: 10.1109/TCE.2012.6414994
Back to article
33Logan, B., Kositsky, A., and Moreno, P. (2004). Semantic analysis of song lyrics. In IEEE International Conference on Multimedia and Expo, ICME, volume 2, pages 827–830. DOI: 10.1109/ICME.2004.1394328
Back to article
34Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91–110. DOI: 10.1023/B:VISI.0000029664.99615.94
Back to article
35Martí, R., Lozano, J. A., Mendiburu, A., and Hernando, L. (2018). Multi-start methods. In Martí, R., Pardalos, P. M., and Resende, M. G. C., editors, Handbook of Heuristics, pages 155–175. Springer International Publishing, Cham. DOI: 10.1007/978-3-319-07124-4_1
Back to article
36Martin, R., and Nagathil, A. M. (2009). Cepstral modulation ratio regression (CMRARE) parameters for audio signal analysis and classification. In Proc. of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP, pages 321–324. DOI: 10.1109/ICASSP.2009.4959585
Back to article
37Mauch, M., and Levy, M. (2011). Structural change on multiple time scales as a correlate of musical complexity. In Klapuri, A. and Leider, C., editors, Proc. of the 12th International Society for Music Information Retrieval Conference, ISMIR, pages 489–494.
Back to article
38Mayer, R., and Rauber, A. (2010). Multimodal aspects of music retrieval: Audio, song lyrics – and beyond? In Ras, Z. W. and Wieczorkowska, A., editors, Advances in Music Information Retrieval, pages 333–363. Springer. DOI: 10.1007/978-3-642-11674-2_15
Back to article
39Mayer, R., Rauber, A., de León, P. J. P., Pérez-Sancho, C., and Iñesta, J. M. (2010). Feature selection in a Cartesian ensemble of feature subspace classifiers for music categorisation. In Proc. of the 3rd International Workshop on Machine Learning and Music (MML), pages 53–56. ACM. DOI: 10.1145/1878003.1878021
Back to article
40McFee, B., and Lanckriet, G. R. G. (2012). Hypergraph models of playlist dialects. In Proc. of the 13th International Society for Music Information Retrieval Conference, ISMIR, pages 343–348.
Back to article
41McKay, C., Burgoyne, J. A., Hockman, J., Smith, J. B. L., Vigliensoni, G., and Fujinaga, I. (2010). Evaluating the genre classification performance of lyrical features relative to audio, symbolic and cultural features. In Proc. of the 11th International Society for Music Information Retrieval Conference, ISMIR, pages 213–218.
Back to article
42McKay, C., Cumming, J., and Fujinaga, I. (2018). jSymbolic 2.2: Extracting features from symbolic music for use in musicological and MIR research. In Proc. of the 19th International Society for Music Information Retrieval Conference, ISMIR, pages 348–354.
Back to article
43McKay, C., and Fujinaga, I. (2006). Musical genre classification: Is it worth pursuing and how can it be improved? In Proc. of the 7th International Conference on Music Information Retrieval, ISMIR, pages 101–106.
Back to article
44McKay, C., and Fujinaga, I. (2008). Combining features extracted from audio, symbolic and cultural sources. In Proc. of the 9th International Conference on Music Information Retrieval, ISMIR, pages 597–602.
Back to article
45Meseguer-Brocal, G., Cohen-Hadria, A., and Peeters, G. (2018). DALI: A large dataset of synchronized audio, lyrics and notes, automatically created using teacher-student machine learning paradigm. In Proc. of the 19th International Society for Music Information Retrieval Conference, ISMIR, pages 431–437.
Back to article
46Müller, M., and Ewert, S. (2011). Chroma Toolbox: MATLAB implementations for extracting variants of chroma-based audio features. In Proc. of the 12th International Society for Music Information Retrieval Conference, ISMIR, pages 215–220.
Back to article
47Neumayer, R., and Rauber, A. (2007). Integration of text and audio features for genre classification in music information retrieval. In Proc. of the 29th European Conference on IR Research, ECIR, pages 724–727. DOI: 10.1007/978-3-540-71496-5_78
Back to article
48Oramas, S., Barbieri, F., Nieto, O., and Serra, X. (2018). Multimodal deep learning for music genre classification. Transactions of the International Society for Music Information Retrieval, 1(1):4–21. DOI: 10.5334/tismir.10
Back to article
49Oramas, S., Nieto, O., Barbieri, F., and Serra, X. (2017). Multi-label music genre classification from audio, text and images using deep features. In Proc. of the 18th International Society for Music Information Retrieval Conference, ISMIR, pages 23–30.
Back to article
50Orio, N., Rizo, D., Miotto, R., Schedl, M., Montecchio, N., and Lartillot, O. (2011). MusiCLEF: A benchmark activity in multimodal music information retrieval. In Proc. of the 12th International Society for Music Information Retrieval Conference, ISMIR, pages 603–608.
Back to article
51Panda, R., Malheiro, R., Rocha, B., Oliveira, A., and Paiva, R. P. (2013). Multi-modal music emotion recognition: A new dataset, methodology and comparative analysis. In Proc. of the 10th International Symposium on Computer Music Multidisciplinary Research, CMMR. Springer.
Back to article
52Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12: 2825–2830.
Back to article
53Raffel, C. (2016). Learning-Based Methods for Comparing Sequences, with Applications to Audio-to-MIDI Alignment and Matching. PhD thesis, Graduate School of Arts and Sciences, Columbia University. DOI: 10.1109/ICASSP.2016.7471641
Back to article
54Řehůřek, R., and Sojka, P. (2010). Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pages 45–50.
Back to article
55Reunanen, J. (2003). Overfitting in making comparisons between variable selection methods. Journal of Machine Learning Research, 3: 1371–1382.
Back to article
56Rötter, G., Vatolkin, I., and Weihs, C. (2013). Computational prediction of high-level descriptors of music personal categories. In Lausen, B., den Poel, D. V., and Ultsch, A., editors, Algorithms from and for Nature and Life – Classification and Data Analysis, Studies in Classification, Data Analysis, and Knowledge Organization, pages 529–537. Springer. DOI: 10.1007/978-3-319-00035-0_54
Back to article
57Saari, P., Eerola, T., and Lartillot, O. (2011). Generalizability and simplicity as criteria in feature selection: Application to mood classification in music. IEEE Transactions on Audio, Speech, and Language Processing, 19(6):1802–1812. DOI: 10.1109/TASL.2010.2101596
Back to article
58Schindler, A. (2019). Multi-Modal Music Information Retrieval: Augmenting Audio-Analysis with Visual Computing for Improved Music Video Analysis. PhD thesis, Faculty of Informatics, TU Wien.
Back to article
59Schreiber, H. (2015). Improving genre annotations for the Million Song Dataset. In Proceedings of the 16th International Society for Music Information Retrieval Conference, ISMIR, pages 241–247.
Back to article
60Sigtia, S., and Dixon, S. (2014). Improved music feature learning with deep neural networks. In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pages 6959–6963. DOI: 10.1109/ICASSP.2014.6854949
Back to article
61Silla, C. N., Jr., Koerich, A. L., and Kaestner, C. A. A. (2009). A feature selection approach for automatic music genre classification. International Journal of Semantic Computing, 3(2):183–208. DOI: 10.1142/S1793351X09000719
Back to article
62Simonetta, F., Ntalampiras, S., and Avanzini, F. (2019). Multimodal music information processing and retrieval: Survey and future challenges. In Proc. of the International Workshop on Multilayer Music Representation and Processing, MMRP, pages 10–18. DOI: 10.1109/MMRP.2019.00012
Back to article
63Sturm, B. L. (2012a). A survey of evaluation in music genre recognition. In 10th International Workshop on Adaptive Multimedia Retrieval:Semantics, Context, and Adaptation, AMR, pages 29–66. DOI: 10.1007/978-3-319-12093-5_2
Back to article
64Sturm, B. L. (2012b). Two systems for automatic music genre recognition: What are they really recognizing? In Proc. of the 2nd International ACMWorkshop on Music Information Retrieval with User-Centered and Mulitmodal Strategies, MIRUM, pages 69–74. DOI: 10.1145/2390848.2390866
Back to article
65Sturm, B. L. (2013a). Classification accuracy is not enough. Journal of Intelligent Information Systems, 41(3):371–406. DOI: 10.1007/s10844-013-0250-y
Back to article
66Sturm, B. L. (2013b). Evaluating music emotion recognition: Lessons from music genre recognition? In IEEE International Conference on Multimedia and Expo Workshops, ICMEW, pages 1–6. DOI: 10.1109/ICMEW.2013.6618342
Back to article
67Tzanetakis, G., and Cook, P. R. (2002). Musical genre classification of audio signals. IEEE Transactions on Speech and Audio Processing, 10(5):293–302. DOI: 10.1109/TSA.2002.800560
Back to article
68Vatolkin, I. (2015). Exploration of two-objective scenarios on supervised evolutionary feature selection: A survey and a case study (application to music categorisation). In Proc. of the 8th International Conference on Evolutionary Multi-Criterion Optimization, pages 529–543. Springer. DOI: 10.1007/978-3-319-15892-1_36
Back to article
69Vatolkin, I., Bonnin, G., and Jannach, D. (2014). Comparing audio features and playlist statistics for music classification. In Analysis of Large and Complex Data – Second European Conference on Data Analysis, ECDA, pages 437–447. DOI: 10.1007/978-3-319-25226-1_37
Back to article
70Vatolkin, I., Preuß, M., and Rudolph, G. (2011). Multiobjective feature selection in music genre and style recognition tasks. In Krasnogor, N. and Lanzi, P. L., editors, Proc. of the 13th Annual Genetic and Evolutionary Computation Conference (GECCO), pages 411–418. ACM Press. DOI: 10.1145/2001576.2001633
Back to article
71Vatolkin, I., Rudolph, G., and Weihs, C. (2015). Evaluation of album effect for feature selection in music genre recognition. In Proc. of the 16th International Society for Music Information Retrieval Conference, ISMIR, pages 169–175.
Back to article
72Vatolkin, I., Theimer, W. M., and Botteck, M. (2010). AMUSE (Advanced MUSic Explorer): A multitool framework for music data analysis. In Proc. of the 11th International Society for Music Information Retrieval Conference, ISMIR, pages 33–38.
Back to article
73Weihs, C., Jannach, D., Vatolkin, I., and Rudolph, G., editors (2017). Music Data Analysis: Foundations and Applications. CRC Press. DOI: 10.1201/9781315370996
Back to article
74Wilkes, B. (2019). Analyse von bild-, text- und audiobasierten Merkmalen für die Klassifikation von Musikgenres. Master’s thesis, Department of Computer Science, TU Dortmund.
Back to article
75Witten, I. H., Frank, E., Hall, M. A., and Pal, C. J. (2016). Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann Series in Data Management Systems. Morgan Kaufmann, Burlington, Massachusetts.
Back to article
76Zangerle, E., Tschuggnall, M., Wurzinger, S., and Specht, G. (2018). Alf-200k: Towards extensive multimodal analyses of music tracks and playlists. In Pasi, G., Piwowarski, B., Azzopardi, L., and Hanbury, A., editors, Advances in Information Retrieval, pages 584–590. Springer. DOI: 10.1007/978-3-319-76941-7_48
Back to article
77Zitzler, E. (2012). Evolutionary multiobjective optimization. In Rozenberg, G., Bäck, T., and Kok, J. N., editors, Handbook of Natural Computing, Volume 2, pages 871–904. Springer, Berlin Heidelberg. DOI: 10.1007/978-3-540-92910-9_28
Back to article

Multi-Objective Investigation of Six Feature Source Types for Multi-Modal Music Classification

References

Paradigm

My account