References
- 1Ashok, A., Rhinehart, N., Beainy, F., & Kitani, K. M. (2017). N2N learning: Network to network compression via policy gradient reinforcement learning. CoRR, abs/1709.06030.
- 2Balke, S., Dittmar, C., Abeßer, J., Frieler, K., Pfleiderer, M., & Müller, M. (2018). Bridging the gap: Enriching YouTube videos with jazz music annotations. Frontiers in Digital Humanities, 5:1. DOI: 10.3389/fdigh.2018.00001
- 3Benzi, K., Defferrard, M., Vandergheynst, P., & Bresson, X. (2016). FMA: A dataset for music analysis. CoRR, abs/1612.01840.
- 4Bittner, R., & Bosch, J. J. (2019). Generalized metrics for single-f0 estimation evaluation. In Proceedings of 20th International Society for Music Information Retrieval Conference, Delft, The Netherlands.
- 5Bittner, R., Fuentes, M., Rubinstein, D., Jansson, A., Choi, K., & Kell, T. (2019). mirdata: Software for reproducible usage of datasets. In Proceedings of 20th International Society for Music Information Retrieval Conference, Delft, The Netherlands.
- 6Bittner, R., Salamon, J., Tierney, M., Mauch, M., Cannam, C., & Bello, J. (2014). MedleyDB: A multitrack dataset for annotation-intensive MIR research. In Proceedings of 15th International Society for Music Information Retrieval Conference, Taipei, Taiwan.
- 7Bucilua, C., Caruana, R., & Niculescu-Mizil, A. (2006). Model compression. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’06, page 535–541, New York, NY, USA.
Association for Computing Machinery . DOI: 10.1145/1150402.1150464 - 8Cui, J., Kingsbury, B., Ramabhadran, B., Saon, G., Sercu, T., Audhkhasi, K., Sethy, A., Nussbaum-Thom, M., & Rosenberg, A. (2017). Knowledge distillation across ensembles of multilingual models for low-resource languages. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. DOI: 10.1109/ICASSP.2017.7953073
- 9Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255. DOI: 10.1109/CVPR.2009.5206848
- 10Donahue, C., Henry Mao, H., & McAuley, J. (2018). The NES Music Database: A multi-instrumental dataset with expressive performance attributes. In Proceedings of 19th International Society for Music Information Retrieval Conference, Paris, France.
- 11Doras, G., Esling, P., & Peeters, G. (2019). On the use of u-net for dominant melody estimation in polyphonic music. In 2019 International Workshop on Multilayer Music Representation and Processing (MMRP), pages 66–70. DOI: 10.1109/MMRP.2019.00020
- 12Dzhambazov, G. (2017).
Knowledge-based Probabilistic Modeling for Tracking Lyrics in Music Audio Signals . PhD thesis, Universitat Pompeu Fabra. - 13Fonseca, E., Pons, J., Favory, X., Font, F., Bogdanov, D., Ferraro, A., Oramas, S., Porter, A., & Serra, X. (2017). Freesound datasets: A platform for the creation of open audio datasets. In Proceedings of 18th International Society for Music Information Retrieval Conference, Suzhou, China.
- 14Fujihara, H., & Goto, M. (2012). Lyrics-to-audio alignment and its application. In Multimodal Music Processing, volume 3 of Dagstuhl Follow-Ups, pages 23–36. Dagstuhl, Germany.
- 15Fujihara, H., Goto, M., Ogata, J., & Okuno, H. G. (2011). LyricSynchronizer: Automatic synchronization system between musical audio signals and lyrics. IEEE Journal of Selected Topics in Signal Processing, 5(6), 1252–1261. DOI: 10.1109/JSTSP.2011.2159577
- 16Goto, M. (2014). Singing information processing. In 12th International Conference on Signal Processing, pages 2431–2438. DOI: 10.1109/ICOSP.2014.7015431
- 17Gupta, C., Tong, R., Li, H., & Wang, Y. (2018). Semi-supervised lyrics and solo-singing alignment. In Proceedings of 19th International Society for Music Information Retrieval Conference.
- 18Gupta, C., Yılmaz, E., & Li, H. (2019). Acoustic modeling for automatic lyrics-to-audio alignment. arXiv preprint arXiv:1906.10369. DOI: 10.21437/Interspeech.2019-1520
- 19Hansen, J. K. (2012). Recognition of phonemes in acappella recordings using temporal patterns and mel frequency cepstral coefficients. In Proceedings of 9th Sound and Music Computing Conference, Copenhagen, Denmark.
- 20Hinton, G., Vinyals, O., & Dean, J. (2014). Distilling the knowledge in a neural network. In NIPS Deep Learning and Representation Learning Workshop.
- 21Humphrey, E. J., Montecchio, N., Bittner, R., Jansson, A., & Jehan, T. (2017). Mining labelled data from web-scale collections for vocal activity detection in music. In Proceedings of 18th International Society for Music Information Retrieval Conference, Suzhou, China.
- 22Iskandar, D., Wang, Y., Kan, M.-Y., & Li, H. (2006). Syllabic level automatic synchronization of music signals and text lyrics. In Proceedings of the 14th ACM International Conference on Multimedia, MM ’06, pages 659–662, New York, NY, USA.
ACM . DOI: 10.1145/1180639.1180777 - 23Kan, M.-Y., Wang, Y., Iskandar, D., Nwe, T. L., & Shenoy, A. (2008). LyricAlly: Automatic synchronization of textual lyrics to acoustic music signals. IEEE Transactions on Audio, Speech, and Language Processing, 16(2), 338–349. DOI: 10.1109/TASL.2007.911559
- 24Krizhevsky, A. (2009).
Learning multiple layers of features from tiny images . Master’s thesis, University of Toronto, Department of Computer Science. - 25Kruspe, A. M. (2016). Bootstrapping a system for phoneme recognition and keyword spotting in unaccompanied singing. In Proceedings of 17th International Society for Music Information Retrieval Conference, pages 358–364, New York City, United States.
- 26Le Cun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient based learning applied to document recognition. Proceedings of IEEE, 86(11), 2278–2324. DOI: 10.1109/5.726791
- 27Lee, S. W., & Scott, J. (2017). Word level lyrics-audio synchronization using separated vocals. In IEEE International Conference on Acoustics, Speech and Signal Processing. DOI: 10.1109/ICASSP.2017.7952235
- 28Maia, L., Fuentes, M., Biscainho, L., Rocamora, M., & Essid, S. (2019). SAMBASET: A dataset of historical samba de enredo recordings for computational music analysis. In Proceedings of 20th International Society for Music Information Retrieval Conference, Delft, The Netherlands.
- 29Mauch, M., Fujihara, H., & Goto, M. (2012). Integrating additional chord information into HMM-based lyrics-to-audio alignment. IEEE Transactions on Audio, Speech, and Language Processing, 20, 200–210. DOI: 10.1109/TASL.2011.2159595
- 30Mesaros, A. (2013). Singing voice identification and lyrics transcription for music information retrieval. In 7th Conference on Speech Technology and Human-Computer Dialogue (SpeD), pages 1–10. DOI: 10.1109/SpeD.2013.6682644
- 31Mesaros, A., & Virtanen, T. (2010). Automatic recognition of lyrics in singing. EURASIP Journal on Audio, Speech, and Music Processing, 2010, 1–11. DOI: 10.1155/2010/546047
- 32Meseguer-Brocal, G., Cohen-Hadria, A., & Peeters, G. (2018). Dali: a large dataset of synchronised audio, lyrics and notes, automatically created using teacher-student machine learning paradigm. In Proceedings of 19th International Society for Music Information Retrieval Conference, Paris, France.
- 33Meseguer-Brocal, G., Peeters, G., Pellerin, G., Buffa, M., Cabrio, E., Faron Zucker, C., Giboin, A., Mirbel, I., Hennequin, R., Moussallam, M., Piccoli, F., & Fillon, T. (2017). WASABI: A two million song database project with audio and cultural metadata plus WebAudio enhanced client applications. In Web Audio Conference, London, U.K.
- 34Müller, M., Kurth, F., Damm, D., Fremerey, C., & Clausen, M. (2007).
Lyrics-based audio retrieval and multimodal navigation in music collections . In Kovács, L., Fuhr, N., & Meghini, C., editors, Research and Advanced Technology for Digital Libraries, pages 112–123. Springer, Berlin, Heidelberg. DOI: 10.1007/978-3-540-74851-9_10 - 35Nieto, O., McCallum, M., Davies, M., Robertson, A., Stark, A., & Egozy, E. (2019). The Harmonix set: Beats, downbeats, and functional segment annotations of Western popular music. In Proceedings of 20th International Society for Music Information Retrieval Conference, Delft, The Netherlands.
- 36Peeters, G., & Fort, K. (2012). Towards a (better) definition of annotated MIR corpora. In Proceedings of 13th International Society for Music Information Retrieval Conference, Porto, Portugal.
- 37Ramona, M., Richard, G., & David, B. (2008). Vocal detection in music with support vector machines. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. DOI: 10.1109/ICASSP.2008.4518002
- 38Rivest, R. (1992). The MD5 message-digest algorithm. RFC 1321, Internet Engineering Task Force Network Working Group. DOI: 10.17487/rfc1321
- 39Schlüter, J., & Grill, T. (2015). Exploring Data Augmentation for Improved Singing Voice Detection with Neural Networks. In Proceedings of 16th International Society for Music Information Retrieval Conference, Malaga, Spain.
- 40Settles, B. (2008).
Curious Machines: Active Learning with Structured Instances . PhD thesis, Stanford University, Music Department. - 41Smith, J. (2013).
Correlation Analyses of Encoded Music Performance . PhD thesis, Stanford University, Music Department. - 42Stoller, D., Durand, S., & Ewert, S. (2019). End-to-end lyrics alignment for polyphonic music using an audio-to-character recognition model. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 5275–5279. DOI: 10.1109/ICASSP.2019.8683470
- 43Watanabe, S., Hori, T., Le Roux, J., & Hershey, J. (2017). Student-teacher network learning with enhanced features. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 5275–5279. DOI: 10.1109/ICASSP.2017.7953163
- 44Wong, C. H., Szeto, W. M., & Wong, K. H. (2007). Automatic lyrics alignment for Cantonese popular music. Multimedia Systems, 12(4), 307–323. DOI: 10.1007/s00530-006-0055-8
- 45Wu, C., & Lerch, A. (2017). Automatic drum transcription using the student-teacher learning paradigm with unlabeled music data. In Proceedings of 18th International Society for Music Information Retrieval Conference, Suzhou, China.
- 46Yesiler, F., Tralie, C., Correya, A., Furtado Silva, D., Tovstogan, P., Gomez, E., & Serra, X. (2019). Da-TACOS: A dataset for cover song identification and understanding. In Proceedings of 20th International Society for Music Information Retrieval Conference, Delft, The Netherlands.
