References
- Agrawal, Y., Jain, S., Carlson, E., Toiviainen, P., and Alluri, V. (2020). Towards multimodal MIR: predicting individual differences from music‑induced movement. In Proceedings of the 21th International Society for Music Information Retrieval Conference (ISMIR) (pp. 54–61).
- Alfaro‑Contreras, M., Valero‑Mas, J. J., Iñesta, J. M., and Calvo‑Zaragoza, J. (2023). Late multimodal fusion for image and audio music transcription. Expert Systems with Applications, 216, 119491.
- Antović, M., Küssner, M. B., Kempf, A., Omigie, D., Hashim, S., and Schiavio, A. (2023). ‘A huge man is bursting out of a rock’: Bodies, motion, and creativity in verbal reports of musical connotation. Journal of New Music Research, 52(1), 73–86.
- Arthur, C., and Condit‑Schultz, N. (2023). The Coordinated Corpus of Popular Musics (CoCoPops): A meta‑corpus of melodic and harmonic transcriptions. In Proceedings of the 24th International Society for Music Information Retrieval Conference (ISMIR) (pp. 239–246).
- Baltrušaitis, T., Ahuja, C., and Morency, L.‑P. (2019). Multimodal machine learning: A survey and taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(2), 423–443.
- Barthet, M., Anglade, A., Fazekas, G., Kolozali, S., and Macrae, R. (2011).
Music recommendation for music learning: Hotttabs, a multimedia guitar tutor . In Proceedings of the Workshop on Music Recommendation and Discovery (WOMRAD) (pp. 7–13). ACM. - Barthet, M., and Dixon, S. (2011). Ethnographic observations of musicologists at the British Library: Implications for music information retrieval. In Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR) (pp. 353–358).
- Bemman, B., Bertelsen, L. R., Wärja, M., and Bonde, L. O. (2023). Inter‑rater agreement in classifying music according to a guided imagery and music taxonomy. Journal of Music Therapy, 60(3), 282–313.
- Bertin‑Mahieux, T., Ellis, D. P. W., Whitman, B., and Lamere, P. (2011). The Million Song Dataset. In Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR) (pp. 591–596).
- Bosher, H. (2021).
Copyright in the music industry: A practical guide to exploiting and enforcing rights . In Copyright in the Music Industry. Edward Elgar Publishing. - Burgoyne, J., Wild, J., and Fujinaga, I. (2011). An expert ground truth set for audio chord recognition and music analysis. In Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR) (pp. 633–638).
- Cancino‑Chacón, C. E., and Pilkov, I. (2024).
The Rach3 Dataset: Towards data‑driven analysis of piano performance rehearsal . In S. Rudinac, A. Hanjalic, C. Liem, M. Worring, and B. Jónsson (Eds.), MultiMedia modeling (Vol. 14565, pp. 28–41). Springer Nature. Series Title Lecture Notes in Computer Science. - Caplan‑Auerbach, J., Marczewski, K., & Bullock, G. (2023). Beast Quake (Taylor's Version): Analysis of seismic signals recorded during two Taylor Swift concerts. GSA Today, 34, 4–10.
- Carvalho, L., Washüttl, T., and Widmer, G. (2023). Self‑supervised contrastive learning for robust audio‑sheet music retrieval systems. In Proceedings of the 14th Conference on ACM Multimedia Systems (MMSys) (pp. 239–248).
- Carvalho, L., and Widmer, G. (2023). Towards robust and truly large‑scale audio‑sheet music retrieval. In Proceedings of the 6th International Conference on Multimedia Information Processing and Retrieval (MIPR) (pp. 1–6). IEEE.
- Celma, Ò., Herrera, P., and Serra, X. (2006). A multimodal approach to bridge the music semantic gap. In International Conference on Semantics and Digital Media Technologies (Posters and Demos).
- Chander, A., Huberth, M., Davis, S., Silverstein, S., and Fujioka, T. (2022). Violinists employ more expressive gesture and timing around global musical resolutions. Music Perception, 39(3), 268–288.
- Chen, X., Zhang, H., Wu, S., Zheng, J., Sun, L., and Zhang, K. (2022). A dataset for learning stylistic and cultural correlations between music and videos. Cognitive Computation and Systems, 4(2), 177–187.
- Cheng, H.‑T., Yang, Y.‑H., Lin, Y.‑C., and Chen, H. H. (2009).
Multimodal structure segmentation and analysis of music using audio and textual information . In Proceedings of the 2009 IEEE International Symposium on Circuits and Systems (ISCAS) (pp. 1677–1680). IEEE. - Chiarandini, L., Zanoni, M., and Sarti, A. (2011).
A system for dynamic playlist generation driven by multimodal control signals and descriptors . In Proceedings of the 13th International Workshop on Multimedia Signal Processing (MMSP) (pp. 1–6). IEEE. - Choi, K., and Wang, Y. (2021). Listen, read, and identify: Multimodal singing language identification of music. In Proceedings of the 22nd International Society for Music Information Retrieval Conference (ISMIR) (pp. 121–127).
- Christodoulou, A., Lartillot, O., and Jensenius, A. R. (2024). Multimodal music datasets? Challenges and future goals in music processing. International Journal of Multimedia Information Retrieval, 13(3), 37.
- Cope, D. (2001). Virtual Music: Computer Synthesis of Musical Style. MIT Press.
- Cui, X., Qu, X., Li, D., Yang, Y., Li, Y., and Zhang, X. (2023). MKGCN: Multi‑modal knowledge graph convolutional network for music recommender systems. Electronics, 12(12), 2688.
- da Silva., A. C. M., Silva, D. F., and Marcacini, R. M. (2022). Multimodal representation learning over heterogeneous networks for tag‑based music retrieval. Expert Systems with Applications, 207, 117969.
- da Silva., A. C. M., Silva, D. F., and Marcacini, R. M. (2024). Artist similarity based on heterogeneous graph neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 32, 3717–3729.
- Delbouys, R., Hennequin, R., Piccoli, F., Royo‑Letelier, J., and Moussallam, M. (2018). Music mood detection based on audio and lyrics with deep neural net. In Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR) (pp. 370–375).
- Deng, Z., and Zhou, R. (2023). Vocal92: Audio dataset with a cappella solo singing and speech. IEEE Access, 11, 140958–140966.
- Doh, S., Won, M., Choi, K., and Nam, J. (2023). Textless speech‑to‑music retrieval using emotion similarity. In Proceedings of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 1–5). IEEE.
- Dorfer, M., Arzt, A., and Widmer, G. (2016). Towards score following in sheet music images. In Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR).
- Dorfer, M., Arzt, A., and Widmer, G. (2017). Learning audio‑sheet music correspondences for score identification and offline alignment. In Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR).
- Dorfer, M., Hajič Jr. J., Arzt., A., Frostel, H., and Widmer, G. (2018). Learning audio–sheet music correspondences for cross‑modal retrieval and piece identification. Transactions of the International Society for Music Information Retrieval, 1(1), 22–31.
- Ephrat, A., Mosseri, I., Lang, O., Dekel, T., Wilson, K., Hassidim, A., Freeman, W. T., and Rubinstein, M. (2018). Looking to listen at the cocktail party: A speaker‑independent audio‑visual model for speech separation. ACM Transactions on Graphics, 37(4), 112.
- Essid, S., Lin, X., Gowing, M., Kordelas, G., Aksay, A., Kelly, P., Fillon, T., Zhang, Q., Dielmann, A., Kitanovski, V., Tournemenne, R., Masurelle, A., Izquierdo, E., O'Connor, N. E., Daras, P., and Richard, G. (2012). A multi‑modal dance corpus for research into interaction between humans in virtual environments. Journal on Multimodal User Interfaces.
- Faudemay, P., Montacie, C., and Caraty, M.‑J. (1997).
Video Indexing Based on Image and Sound . In C.‑C. J. Kuo, S.‑F. Chang, and V. N. Gudivada (Eds.), Multimedia Storage and Archiving Systems II (Vol. 3229, pp. 57–69). International Society for Optics and Photonics SPIE. - Foscarin, F., Jacquemard, F., and Fournier‑S'niehotta, R. (2019). A diff procedure for music score files. In Proceedings of the 6th International Conference on Digital Libraries for Musicology (DLfM) (pp. 58–64). Association for Computing Machinery.
- Foucard, R., Durrieu, J.‑L., Lagrange, M., and Richard, G. (2010). Multimodal similarity between musical streams for cover version detection. In Proceedings of the 2010 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5514–5517). IEEE.
- Fu, C., Chen, P., Shen, Y., Qin, Y., Zhang, M., Lin, X., Qiu, Z., Lin, W., Yang, J., Zheng, X., Li, K., Sun, X., and Ji, R. (2023). MME: A comprehensive evaluation benchmark for multimodal large language models. CoRR abs/2306.13394
- Gabbolini, G., & Bridge, D. (2024). Surveying more than two decades of music information retrieval research on playlists. ACM Transactions on Intelligent Systems and Technology, 15, 1–68.
- Galán‑Cuenca, A., Valero‑Mas, J. J., Martinez‑Sevilla, J. C., Hidalgo‑Centeno, A., Pertusa, A., and Calvo‑Zaragoza, J. (2024). MUSCAT: A multimodal music collection for automatic transcription of real recordings and image scores. In Proceedings of the 32nd ACM International Conference on Multimedia (MM) (pp. 583–591).
- Gillet, O., and Richard, G. (2006). ENST‑Drums: An extensive audio‑visual database for drum signals processing. In Proceedings of the 7th International Conference on Music Information Retrieval (pp. 156–159).
- Goehr, L. (1994). The Imaginary Museum of Musical Works: An Essay in the Philosophy of Music. Oxford University Press.
- Gotham, M. R. H. (2021). Connecting the dots: Engaging wider forms of openness for the mutual benefit of musicians and musicologists. Empirical Musicology Review, 16(1), 34–46.
- Gotham, M. R. H., and Jonas, P. (2022). The OpenScore Lieder Corpus. In Proceedings of the 2001 Music Encoding Conference (pp. 131–136). Humanities Commons.
- Gotham, M. R. H., Redbond, M., Bower, B., and Jonas, P. (2023). The “OpenScore String Quartet” Corpus. In Proceedings of the 10th International Conference on Digital Libraries for Musicology (DLfM) (pp. 49–57). ACM.
- Gu, X., Ou, L., Ong, D., and Wang, Y. (2022). MM‑ALT: A multimodal automatic lyric transcription system. In Proceedings of the 30th ACM International Conference on Multimedia (MM) (pp. 3328–3337).
- Gu, Y., Wang, Z., Zhou, J., Wang, Z., and Zhu, H. (2023).
Acoustics‑text dual‑modal joint representation learning for cover song identification . In Proceedings of the 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) (pp. 1–8). IEEE. - Han, D., Gotham, M. R. H., Kim, D., Park, H., Lee, S., and Jeong, D. (2024). Six dragons fly again: Reviving 15th‑century Korean court music with transformers and novel encoding. In Proceedings of the 25th International Society for Music Information Retrieval Conference (ISMIR) (pp. 217–224).
- Harte, C., Sandler, M. B., Abdallah, S. A., and Gómez, E. (2005). Symbolic representation of musical chords: A proposed syntax for text annotations. In Proceedings of the 6th International Conference on Music Information Retrieval (ISMIR) (pp. 66–71).
- Haus, G., and Pollastri, E. (2000). A multimodal framework for music inputs. In Proceedings of the 8th ACM International Conference on Multimedia (pp. 382–384).
- Hentschel, F., and Kreutz, G. (2021). The perception of musical expression in the nineteenth century: The case of the glorifying hymnic*. Music & Science, 4, 205920432110123.
- Herold, K., Kepper, J., Mo, R., and Seipelt, A. (2020). MusicDiff–a diff tool for MEI. In Proceedings of the 2020 Music Encoding Conference (p. 59).
- Hochenbaum, J., and Kapur, A. (2012). Nuance: A software tool for capturing synchronous data streams from multimodal musical systems. In Proceedings of the 38th International Computer Music Conference (ICMC) (pp. 337–342).
- Hsu, J.‑L., and Huang, C.‑C. (2015). Designing a graph‑based framework to support a multi‑modal approach for music information retrieval. Multimedia Tools and Applications, 74(15), 5401–5427.
- Hu, X., and Downie, J. S. (2010). Improving mood classification in music digital libraries by combining lyrics and audio. In Proceedings of the 10th Annual Joint Conference on Digital Libraries (JCDL) (pp. 159–168).
- Hu, X., Li, F., and Liu, R. (2022). Detecting music‑induced emotion based on acoustic analysis and physiological sensing: A multimodal approach. Applied Sciences, 12(18), 9354.
- Hung, H.‑T., Ching, J., Doh, S., Kim, N., Nam, J., and Yang, Y.‑H. (2021). EMOPIA: A multi‑modal pop piano dataset for emotion recognition and emotion‑based music generation. In Proceedings of the 22nd International Society for Music Information Retrieval Conference (ISMIR) (pp. 318–325).
- Jang, J.‑S. R., Lee, H.‑R., Chen, J.‑C., and Lin, C.‑Y. (2004). Research and developments of a multi‑modal MIR engine for commercial applications in East Asia. Journal of the American Society for Information Science and Technology, 55(12), 1067–1076.
- Jensenius, A. R. (2022). Sound Actions: Conceptualizing Musical Instruments. MIT Press.
- Johansson, E., and Lindgren, J. (2023). The Gunnlod dataset: Engineering a dataset for multi‑modal music generation. Technical Report 2023:456, KTH. School of Electrical Engineering and Computer Science (EECS).
- Kamp, M., Summers, T., and Sweeney, M. (Eds.). (2016). Ludomusicology: Approaches to Video Game Music. Equinox Publishing.
- Koelstra, S., Muhl, C., Soleymani, M., Lee, J.‑S., Yazdani, A., Ebrahimi, T., Pun, T., Nijholt, A., and Patras, I. (2011). DEAP: A database for emotion analysis; using physiological signals. IEEE Transactions on Affective Computing, 3(1), 18–31.
- Koozin, T. (2011). Guitar voicing in Pop‑Rock music: A performance‑based analytical approach. Music Theory Online, 17.
- Kritsis, K., Gkiokas, A., Pikrakis, A., and Katsouros, V. (2021). Attention‑based multimodal feature fusion for dance motion generation. In Proceedings of the 2021 International Conference on Multimodal Interaction (ICMI) (pp. 763–767). Association for Computing Machinery.
- Kurth, F., Damm, D., Fremerey, C., Müller, M., and Clausen, M. (2008). A framework for managing multimodal digitized music collections. In Proceedings of the 12th European Conference on Research and Advanced Technology for Digital Libraries (ECDL) (pp. 334–345). Springer.
- Kurth, F., Müller, M., Damm, D., Fremerey, C., Ribbrock, A., and Clausen, M. (2005). SyncPlayer—an advanced system for multimodal music access. In Proceedings of the 6th International Conference on Music Information Retrieval (ISMIR) (pp. 381–388).
- Laczkó, B., and Jensenius, A. R. (2021). Reflections on the development of the musical gestures toolbox for Python. In Proceedings of the Nordic Sound and Music Computing Conference (NordicSMC).
- Laurier, C., Grivolla, J., and Herrera, P. (2008). Multimodal music mood classification using audio and lyrics. In Proceedings of the 7th International Conference on Machine Learning and Applications (ICMLA) (pp. 688–693). IEEE.
- Lee, S. W., and Essl, G. (2014). Communication, control, and state sharing in collaborative live coding. In Proceedings of the International Conference on New Interfaces for Musical Expression (NIME) (pp. 263–268).
- Leman, M. (2008). Embodied Music Cognition and Mediation Technology. MIT Press.
- Li, B., Dinesh, K., Xu, C., Sharma, G., and Duan, Z. (2019). Online audio‑visual source association for chamber music performances. Transactions of the International Society for Music Information Retrieval, 2(1), 29–42.
- Li, X., Hu, D., and Lu, X. (2017). Image2song: Song retrieval via bridging image content and lyric words. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV) (pp. 5649–5658).
- Li, Y., Wang, X., Wu, R., Xu, W., and Chen, W. (2023). A CRNN‑GCN piano transcription model based on audio and skeleton features. In Proceedings of the 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW) (pp. 1–5). IEEE.
- Liem, C. C., Müller, M., Eck, D., Tzanetakis, G., and Hanjalic, A. (2011). The need for music information retrieval with user‑centered and multimodal strategies. In Proceedings of the 1st International ACM Workshop on Music Information Retrieval with User‑Centered and Multimodal Strategies (MIRUM) (pp. 1–6).
- Liu, R., and Hu, X. (2020). A multimodal music recommendation system with listeners' personality and physiological signals. In Proceedings of the 2020 ACM/IEEE Joint Conference on Digital Libraries (JCDL) (pp. 357–360).
- Lu, Q., Chen, X., Yang, D., and Wang, J. (2010). Boosting for multi‑modal music emotion. In Proceedings of the 11th International Society for Music Information and Retrieval Conference (ISMIR) (pp. 105–105).
- Lübbers, D. (2005). Sonixplorer: Combining visualization and auralization for content‑based exploration of music collections. In Proceedings of the 6th International Conference on Music Information Retrieval (ISMIR) (pp. 590–593).
- Lübbers, D., and Jarke, M. (2009). Adaptive multimodal exploration of music collections. In Proceedings of the 10th International Society for Music Information Retrieval Conference (ISMIR) (pp. 195–200).
- Manco, I., Benetos, E., Quinton, E., and Fazekas, G. (2021). MusCaps: Generating captions for music audio. In Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN) (pp. 1–8). IEEE.
- Manilow, E., Wichern, G., Seetharaman, P., and Roux, J. L. (2019).
Cutting music source separation some Slakh: A dataset to study the impact of training data quality and quantity . In Proceedings of the 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) (pp. 45–49). IEEE. - Martín‑Gutiérrez, D., Peñaloza, G. H., Belmonte‑Hernández, A., and García, F.Á. (2020). A multimodal end‑to‑end deep learning architecture for music popularity prediction. IEEE Access, 8, 39361–39374.
- Martins, F. D., and Gotham, M. (2023). TiLiA: A timeline annotator for all. HCl International Conference 2023, Workshop 1: Interactive Technologies for Analysing and Visualizing Musical Structure.
- Mayer, R., and Rauber, A. (2011). Musical genre classification by ensembles of audio and lyrics features. In Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR) (pp. 675–680).
- McKay, C. (2010). Automatic music classification with jMIR. PhD thesis Department of Music Research, Schulich School of Music, McGill University.
- McKay, C., Burgoyne, J. A., Hockman, J., Smith, J. B. L., Vigliensoni, G., and Fujinaga, I. (2010). Evaluating the genre classification performance of lyrical features relative to audio, symbolic and cultural features. In Proceedings of the 11th International Society for Music Information Retrieval Conference (ISMIR) (pp. 213–218).
- McVicar, M., Freeman, T., and De Bie, T. (2011). Mining the correlation between lyrical and audio features and the emergence of mood. In Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR) (pp. 783–788).
- Melechovsky, J., Roy, A., and Herremans, D. (2024). MidiCaps—a large‑scale midi dataset with text captions. arXiv preprint arXiv:2406.02255.
- Meseguer‑Brocal, G., Cohen‑Hadria, A., and Peeters, G. (2020). Creating DALI, a large dataset of synchronized audio, lyrics, and notes. Transactions of the International Society for Music Information Retrieval, 3(1), 55–67.
- Meseguer‑Brocal, G., and Peeters, G. (2020). Content based singing voice source separation via strong conditioning using aligned phonemes. Proceedings of the 21st International Society for Music Information Retrieval Conference (ISMIR) (pp. 819–827).
- Mesz, B., Trevisan, M. A., and Sigman, M. (2011). The taste of music. Perception, 40(2), 209–219.
- Miranda, E. R. (Ed.). (2021). Handbook of artificial intelligence for music. Springer.
- Montecchio, N., Roy, P., and Pachet, F. (2020). The skipping behavior of users of music streaming services and its relation to musical structure. PLOS ONE, 15(9), e0239418.
- Moscati, M., Parada‑Cabaleiro, E., Deldjoo, Y., Zangerle, E., and Schedl, M. (2022). Music4All‑Onion—a large‑scale multi‑faceted content‑centric music recommendation dataset. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management (CIKM) (pp. 4339–4343).
- Müller, M. (2021). Fundamentals of music processing (2nd ed.). Springer.
- Müller, M., Goto, M., and Dixon, S. (2011).
Multimodal music processing (Dagstuhl seminar 11041) . In Dagstuhl reports. Schloss Dagstuhl‑Leibniz‑Zentrum für Informatik. - Nadkarni, S., Rao, P., and Clayton, M. (2024). Identifying melodic motifs and stable notes from gestural information in Indian vocal performances. Transactions of the International Society for Music Information Retrieval, 7(1), 246–263.
- Neuwirth, M., Harasim, D., Moss, F. C., and Rohrmeier, M. (2018). The Annotated Beethoven Corpus (ABC): A dataset of harmonic analyses of all Beethoven string quartets. Frontiers in Digital Humanities, 5, 16.
- Oramas, S., Barbieri, F., Nieto, O., and Serra, X. (2018). Multimodal deep learning for music genre classification. Transactions of the International Society for Music Information Retrieval, 1(1), 4–21.
- Oramas, S., Espinosa‑Anke, L., Lawlor, A., Serra, X., and Saggion, H. (2016). Exploring customer reviews for music genre classification and evolutionary studies. In Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR).
- Ostermann, F., Vatolkin, I., and Ebeling, M. (2023). AAM: A dataset of artificial audio multitracks for diverse music information retrieval tasks. EURASIP Journal on Audio, Speech, and Music Processing, 2023(1), 13.
- Perra, J., Poulin‑Charronnat, B., Baccino, T., and Drai‑Zerbib, V. (2021). Review on eye‑hand span in sight‑reading of music. Journal of Eye Movement Research, 144.
- Pesek, M., Strle, G., Kavčič, A., and Marolt, M. (2017). The Moodo dataset: Integrating user context with emotional and color perception of music for affective music information retrieval. Journal of New Music Research, 46(3), 246–260.
- Putkinen, V., Zhou, X., Gan, X., Yang, L., Becker, B., Sams, M., and Nummenmaa, L. (2024). Bodily maps of musical sensations across cultures. Proceedings of the National Academy of Sciences, 121(5), e2308859121.
- Qi, H., Muramatsu, T., and Hashimoto, S. (1997). Multimedia environment for sound database system. In Proceedings of the 1997 International Computer Music Conference (ICMC) (pp. 105–108).
- Quested, G., Boyle, R., and Ng, K. (2008). Polyphonic note tracking using multimodal retrieval of musical events. In Proceedings of the 2008 International Computer Music Conference (ICMC).
- Quilingking Tomas, J. P., Jamilla, R. A. S., Lopo, K. S., and Camba., C. E. (2020). Multimodal emotion detection model implementing late fusion of audio and lyrics in Filipino music. In Proceedings of the 3rd International Conference on Computing and Big Data (ICCBD). (pp. 78–84).
- Range, M. (2024). Wagner's ‘Bridal Chorus’ from Lohengrin and its use as a wedding march. Journal of the Royal Musical Association (pp. 1–25).
- Rossetto, F., Dalton, J., and Murray‑Smith, R. (2023). Generating multimodal augmentations with LLMs from song metadata for music information retrieval. In Proceedings of the 1st Workshop on Large Generative Models Meet Multimodal Applications (LGM3A) (pp. 51–59).
- Schedl, M., and Schnitzer, D. (2014). Location‑aware music artist recommendation. In Proceedings of the 20th Anniversary International Conference on MultiMedia Modeling (MMM) (pp. 205–213). Springer.
- Schindler, A., Mayer, R., and Rauber, A. (2012). Facilitating comprehensive benchmarking experiments on the million song dataset. In Proceedings of the 13th International Society for Music Information Retrieval Conference (ISMIR) (pp. 469–474).
- Schindler, A., and Rauber, A. (2015). An audio‑visual approach to music genre classification through affective color features. In Proceedings of the 37th European Conference on IR Research (ECIR) (pp. 61–67). Springer.
- Schmidt, E. M., and Kim, Y. E. (2011). Modeling musical emotion dynamics with conditional random fields. In Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR) (pp. 777–782).
- Schneider, A. (2018). Systematic musicology: A historical interdisciplinary perspective. Springer Handbook of Systematic Musicology (pp. 1–24).
- Schuller, B., Rigoll, G., and Lang, M. (2004). Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine‑belief network architecture. In Proceedings of the 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (Vol. 1, pp. 577–580). IEEE.
- Schuller, B., Weninger, F., and Dorfner, J. (2011). Multi‑modal non‑prototypical music mood analysis in continuous space: Reliability and performances.
- Schuller, B., Zobl, M., Rigoll, G., and Lang, M. (2003). A hybrid music retrieval system using belief networks to integrate multimodal queries and contextual knowledge. In Proceedings of the 2003 International Conference on Multimedia and Expo (ICME) (Vol. 1, pp. 57–60).
- Sears, D. R. W., Verbeten, J. E., and Percival, H. M. (2023). Does order matter? Harmonic priming effects for scrambled tonal chord sequences. Journal of Experimental Psychology: Human Perception and Performance, 49(7), 999–1015.
- Serra, X. (2014). Creating research corpora for the computational study of music: The case of the CompMusic project. In Proceedings of the 2014 AES International Conference on Semantic Audio. Audio Engineering Society.
- Simonetta, F., Ntalampiras, S., and Avanzini, F. (2019).
Multimodal music information processing and retrieval: Survey and future challenges . In Proceedings of the 2019 International Workshop on Multilayer Music Representation and Processing (MMRP). IEEE. - Simonetta, F., Ntalampiras, S., and Avanzini, F. (2021).
Audio‑to‑score alignment using deep automatic music transcription . In Proceedings of the 23rd International Workshop on Multimedia Signal Processing (MMSP) (pp. 1–6). IEEE. - Slizovskaia, O., Gómez, E., and Haro, G. (2017). Musical instrument recognition in user‑generated videos using a multimodal convolutional neural network architecture. In Proceedings of the 2017 ACM International Conference on Multimedia Retrieval (ICMR) (pp. 226–232).
- Soleymani, M., Pantic, M., and Pun, T. (2011). Multimodal emotion recognition in response to videos. IEEE Transactions on Affective Computing, 3(2), 211–223.
- Stewart, S., Avramidis, K., Feng, T., and Narayanan, S. (2024). Emotion‑aligned contrastive learning between images and music. In Proceedings of the 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 8135–8139). IEEE.
- Strähle, J., and Rödel, J. (2018).
Music as key‑influencer of fashion trends . In J. Strähle (Ed.), Fashion & music (pp. 31–49). Springer. Series Title Springer Series in Fashion Business. - Su, F., and Xue, H. (2017). Graph‑based multimodal music mood classification in discriminative latent space. In Proceedings of the 23rd International Conference in MultiMedia Modeling (MMM) (Vol. I, pp. 152–163). Springer.
- Sung, B.‑H., and Wei, S.‑C. (2021). BECMER: A fusion model using BERT and CNN for music emotion recognition. In Proceedings of the 22nd International Conference on Information Reuse and Integration for Data Science (IRI) (pp. 437–444). IEEE.
- Tabaza, A., Quishawi, O., Yaghi, A., and Qawasmeh, O. (2024). Binding text, images, graphs, and audio for music representation learning. In Proceedings of the Cognitive Models and Artificial Intelligence Conference (AICCONF) (pp. 139–146).
- Thomas, V., Damm, D., Fremerey, C., Clausen, M., Kurth, F., and Müller, M. (2012). PROBADO music: A multimodal online music library. In Proceedings of the 38th International Computer Music Conference (ICMC).
- Thomas, V., Fremerey, C., Damm, D., and Clausen, M. (2009). SLAVE: A score‑lyrics‑audio‑video‑explorer. In Proceedings of the 10th International Society for Music Information Retrieval Conference (ISMIR) (pp. 717–722).
- Turchet, L., O'Sullivan, B., Ortner, R., and Guger, C. (2024). Emotion recognition of playing musicians from EEG, ECG, and acoustic signals. IEEE Transactions on Human‑Machine Systems, 1–11.
- Tymoczko, D., Gotham, M., Cuthbert, M. S., and Ariza, C. (2019). The RomanText format: A flexible and standard method for representing Roman numeral analyses. In Proceedings of the 20th International Society for Music Information Retrieval Conference (ISMIR) (pp. 123–129).
- Varni, G., Volpe, G., and Mazzarino, B. (2011). Towards a social retrieval of music content. In Proceedings of the 3rd International Conference on Privacy, Security, Risk and Trust (PASSAT) and the 3rd International Conference on Social Computing (SocialCOM) (pp. 1466–1473). IEEE.
- Vatolkin, I., and McKay, C. (2022). Multi‑objective investigation of six feature source types for multi‑modal music classification. Transactions of the International Society for Music Information Retrieval, 5(1), 1–19.
- Wang, Y., Kan, M.‑Y., Nwe, T. L., Shenoy, A., and Yin, J. (2004). Lyrically: Automatic synchronization of acoustic musical signals and textual lyrics. In Proceedings of the 12th Annual ACM International Conference on Multimedia (MULTIMEDIA) (pp. 212–219).
- Weck, B., Kirchhoff, H., Grosche, P., and Serra, X. (2024). WikiMuTe: A web‑sourced dataset of semantic descriptions for music audio. In Proceedings of the 30th International Conference on MultiMedia Modeling (MMM) (pp. 42–56). Springer.
- Weiß, C., Arifi‑Müller, V., Krause, M., Zalkow, F., Klauk, S., Kleinertz, R., and Müller, M. (2023). Wagner Ring Dataset: A complex opera scenario for music processing and computational musicology. Transactions of the International Society for Music Information Retrieval, 6(1), 135–149.
- Weiß, C., Zalkow, F., Arifi‑Müller, V., Müller, M., Koops, H. V., Volk, A., and Grohganz, H. G. (2021). Schubert Winterreise Dataset: A multimodal scenario for music analysis. Journal on Computing and Cultural Heritage, 14(2), 25:1–25:18.
- Wengelin, r., and Johansson, V. (2023).
Investigating writing processes with keystroke logging . In O. Kruse, C. Rapp, C. M. Anson, K. Benetos, E. Cotos, A. Devitt,and A. Shibani (Eds.), Digital writing technologies in higher educaqtion (pp. 405–420). Springer International Publishing. - White, C. (2022). The music in the data: Corpus analysis, music analysis, and tonal traditions. Routledge.
- Wiggins, G. (2009).
Computer representation of music in the research environment . In T. Crawfordand L. Gibson (Eds.), Modern methods for musicology: Prospects, proposals, and realities (pp. 7–22). Ashgate Publishing. - Wilkes, B., Vatolkin, I., and Müller, H. (2021). Statistical and visual analysis of audio, text, and image features for multi‑modal music genre recognition. Entropy 23(11), 1502.
- Wilkinson, M. D. et al (2016). The FAIR guiding principles for scientific data management and stewardship. Scientific Data, 3.
- Won, M., Oramas, S., Nieto, O., Gouyon, F., and Serra, X. (2021). Multimodal metric learning for tag‑based music retrieval. In Proceedings of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 591–595). IEEE.
- Wu, Y., Gardner, J., Manilow, E., Simon, I., Hawthorne, C., and Engel, J. H. (2022). The Chamber Ensemble Generator: Limitless high‑quality MIR data via generative modeling. CoRR abs/2209.14458
- Xu, B., Wang, X., and Tang, X. (2014). Fusing music and video modalities using multi‑timescale shared representations. In Proceedings of the 22nd ACM International Conference on Multimedia (MM) (pp. 1073–1076).
- Yang, D., Goutam, A., Ji, K., and Tsai, T. (2022). Large‑scale multimodal piano music identification using marketplace fingerprinting. Algorithms, 15(5), 146.
- Yang, Y.‑H., Lin, Y.‑C., Su, Y.‑F., and Chen, H. H. (2008). A regression approach to music emotion recognition. IEEE Transactions on Audio, Speech, and Language Processing, 16(2), 448–457.
- Yu, S., Yu, Y., Sun, X., and Li, W. (2023). A neural harmonic‑aware network with gated attentive fusion for singing melody extraction. Neurocomputing, 521, 160–171.
- Zalkow, F., Balke, S., Arifi‑Müller, V., and Müller, M. (2020). MTD: A multimodal dataset of musical themes for MIR research. Transactions of the International Society for Music Information Retrieval, 3(1), 180–192.
- Zangerle, E., Tschuggnall, M., Wurzinger, S., and Specht, G. (2018). ALF‑200k: Towards extensive multimodal analyses of music tracks and playlists. In Proceedings of the 40th European Conference on IR Research (ECIR) (pp. 584–590). Springer.
- Zeitler, J., Weiß, C., Arifi‑Müller, V., and Müller, M. (2024). BPSD: A coherent multi‑version dataset for analyzing the first movements of Beethoven's piano sonatas. Transactions of the International Society for Music Information Retrieval, 7(1), 195–212.
- Zeng, D., Yu, Y., and Oyama, K. (2021). MusicTM‑dataset for joint representation learning among sheet music, lyrics, and musical audio. In Proceedings of the 8th Conference on Sound and Music Technology (CSMT) (pp. 78–89). Springer.
- Zhang, Y., Zhou, Z., Li, X., Yu, F., and Sun, M. (2023). CCOM‑HuQin: An annotated multimodal Chinese fiddle performance dataset. Transactions of the International Society for Music Information Retrieval, 6(1), 60–74.
- Zhen, C., and Xu, J. (2010). Notice of retraction: Multi‑modal music genre classification approach. In Proceedings of the 3rd International Conference on Computer Science and Information Technology (CSIT) (pp. 398–402). IEEE.
