Have a personal or library account? Click to login
Creating DALI, a Large Dataset of Synchronized Audio, Lyrics, and Notes Cover

Creating DALI, a Large Dataset of Synchronized Audio, Lyrics, and Notes

Open Access
|Jun 2020

References

  1. 1Ashok, A., Rhinehart, N., Beainy, F., & Kitani, K. M. (2017). N2N learning: Network to network compression via policy gradient reinforcement learning. CoRR, abs/1709.06030.
  2. 2Balke, S., Dittmar, C., Abeßer, J., Frieler, K., Pfleiderer, M., & Müller, M. (2018). Bridging the gap: Enriching YouTube videos with jazz music annotations. Frontiers in Digital Humanities, 5:1. DOI: 10.3389/fdigh.2018.00001
  3. 3Benzi, K., Defferrard, M., Vandergheynst, P., & Bresson, X. (2016). FMA: A dataset for music analysis. CoRR, abs/1612.01840.
  4. 4Bittner, R., & Bosch, J. J. (2019). Generalized metrics for single-f0 estimation evaluation. In Proceedings of 20th International Society for Music Information Retrieval Conference, Delft, The Netherlands.
  5. 5Bittner, R., Fuentes, M., Rubinstein, D., Jansson, A., Choi, K., & Kell, T. (2019). mirdata: Software for reproducible usage of datasets. In Proceedings of 20th International Society for Music Information Retrieval Conference, Delft, The Netherlands.
  6. 6Bittner, R., Salamon, J., Tierney, M., Mauch, M., Cannam, C., & Bello, J. (2014). MedleyDB: A multitrack dataset for annotation-intensive MIR research. In Proceedings of 15th International Society for Music Information Retrieval Conference, Taipei, Taiwan.
  7. 7Bucilua, C., Caruana, R., & Niculescu-Mizil, A. (2006). Model compression. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’06, page 535541, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/1150402.1150464
  8. 8Cui, J., Kingsbury, B., Ramabhadran, B., Saon, G., Sercu, T., Audhkhasi, K., Sethy, A., Nussbaum-Thom, M., & Rosenberg, A. (2017). Knowledge distillation across ensembles of multilingual models for low-resource languages. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. DOI: 10.1109/ICASSP.2017.7953073
  9. 9Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248255. DOI: 10.1109/CVPR.2009.5206848
  10. 10Donahue, C., Henry Mao, H., & McAuley, J. (2018). The NES Music Database: A multi-instrumental dataset with expressive performance attributes. In Proceedings of 19th International Society for Music Information Retrieval Conference, Paris, France.
  11. 11Doras, G., Esling, P., & Peeters, G. (2019). On the use of u-net for dominant melody estimation in polyphonic music. In 2019 International Workshop on Multilayer Music Representation and Processing (MMRP), pages 6670. DOI: 10.1109/MMRP.2019.00020
  12. 12Dzhambazov, G. (2017). Knowledge-based Probabilistic Modeling for Tracking Lyrics in Music Audio Signals. PhD thesis, Universitat Pompeu Fabra.
  13. 13Fonseca, E., Pons, J., Favory, X., Font, F., Bogdanov, D., Ferraro, A., Oramas, S., Porter, A., & Serra, X. (2017). Freesound datasets: A platform for the creation of open audio datasets. In Proceedings of 18th International Society for Music Information Retrieval Conference, Suzhou, China.
  14. 14Fujihara, H., & Goto, M. (2012). Lyrics-to-audio alignment and its application. In Multimodal Music Processing, volume 3 of Dagstuhl Follow-Ups, pages 2336. Dagstuhl, Germany.
  15. 15Fujihara, H., Goto, M., Ogata, J., & Okuno, H. G. (2011). LyricSynchronizer: Automatic synchronization system between musical audio signals and lyrics. IEEE Journal of Selected Topics in Signal Processing, 5(6), 12521261. DOI: 10.1109/JSTSP.2011.2159577
  16. 16Goto, M. (2014). Singing information processing. In 12th International Conference on Signal Processing, pages 24312438. DOI: 10.1109/ICOSP.2014.7015431
  17. 17Gupta, C., Tong, R., Li, H., & Wang, Y. (2018). Semi-supervised lyrics and solo-singing alignment. In Proceedings of 19th International Society for Music Information Retrieval Conference.
  18. 18Gupta, C., Yılmaz, E., & Li, H. (2019). Acoustic modeling for automatic lyrics-to-audio alignment. arXiv preprint arXiv:1906.10369. DOI: 10.21437/Interspeech.2019-1520
  19. 19Hansen, J. K. (2012). Recognition of phonemes in acappella recordings using temporal patterns and mel frequency cepstral coefficients. In Proceedings of 9th Sound and Music Computing Conference, Copenhagen, Denmark.
  20. 20Hinton, G., Vinyals, O., & Dean, J. (2014). Distilling the knowledge in a neural network. In NIPS Deep Learning and Representation Learning Workshop.
  21. 21Humphrey, E. J., Montecchio, N., Bittner, R., Jansson, A., & Jehan, T. (2017). Mining labelled data from web-scale collections for vocal activity detection in music. In Proceedings of 18th International Society for Music Information Retrieval Conference, Suzhou, China.
  22. 22Iskandar, D., Wang, Y., Kan, M.-Y., & Li, H. (2006). Syllabic level automatic synchronization of music signals and text lyrics. In Proceedings of the 14th ACM International Conference on Multimedia, MM ’06, pages 659662, New York, NY, USA. ACM. DOI: 10.1145/1180639.1180777
  23. 23Kan, M.-Y., Wang, Y., Iskandar, D., Nwe, T. L., & Shenoy, A. (2008). LyricAlly: Automatic synchronization of textual lyrics to acoustic music signals. IEEE Transactions on Audio, Speech, and Language Processing, 16(2), 338349. DOI: 10.1109/TASL.2007.911559
  24. 24Krizhevsky, A. (2009). Learning multiple layers of features from tiny images. Master’s thesis, University of Toronto, Department of Computer Science.
  25. 25Kruspe, A. M. (2016). Bootstrapping a system for phoneme recognition and keyword spotting in unaccompanied singing. In Proceedings of 17th International Society for Music Information Retrieval Conference, pages 358364, New York City, United States.
  26. 26Le Cun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient based learning applied to document recognition. Proceedings of IEEE, 86(11), 22782324. DOI: 10.1109/5.726791
  27. 27Lee, S. W., & Scott, J. (2017). Word level lyrics-audio synchronization using separated vocals. In IEEE International Conference on Acoustics, Speech and Signal Processing. DOI: 10.1109/ICASSP.2017.7952235
  28. 28Maia, L., Fuentes, M., Biscainho, L., Rocamora, M., & Essid, S. (2019). SAMBASET: A dataset of historical samba de enredo recordings for computational music analysis. In Proceedings of 20th International Society for Music Information Retrieval Conference, Delft, The Netherlands.
  29. 29Mauch, M., Fujihara, H., & Goto, M. (2012). Integrating additional chord information into HMM-based lyrics-to-audio alignment. IEEE Transactions on Audio, Speech, and Language Processing, 20, 200210. DOI: 10.1109/TASL.2011.2159595
  30. 30Mesaros, A. (2013). Singing voice identification and lyrics transcription for music information retrieval. In 7th Conference on Speech Technology and Human-Computer Dialogue (SpeD), pages 110. DOI: 10.1109/SpeD.2013.6682644
  31. 31Mesaros, A., & Virtanen, T. (2010). Automatic recognition of lyrics in singing. EURASIP Journal on Audio, Speech, and Music Processing, 2010, 111. DOI: 10.1155/2010/546047
  32. 32Meseguer-Brocal, G., Cohen-Hadria, A., & Peeters, G. (2018). Dali: a large dataset of synchronised audio, lyrics and notes, automatically created using teacher-student machine learning paradigm. In Proceedings of 19th International Society for Music Information Retrieval Conference, Paris, France.
  33. 33Meseguer-Brocal, G., Peeters, G., Pellerin, G., Buffa, M., Cabrio, E., Faron Zucker, C., Giboin, A., Mirbel, I., Hennequin, R., Moussallam, M., Piccoli, F., & Fillon, T. (2017). WASABI: A two million song database project with audio and cultural metadata plus WebAudio enhanced client applications. In Web Audio Conference, London, U.K.
  34. 34Müller, M., Kurth, F., Damm, D., Fremerey, C., & Clausen, M. (2007). Lyrics-based audio retrieval and multimodal navigation in music collections. In Kovács, L., Fuhr, N., & Meghini, C., editors, Research and Advanced Technology for Digital Libraries, pages 112123. Springer, Berlin, Heidelberg. DOI: 10.1007/978-3-540-74851-9_10
  35. 35Nieto, O., McCallum, M., Davies, M., Robertson, A., Stark, A., & Egozy, E. (2019). The Harmonix set: Beats, downbeats, and functional segment annotations of Western popular music. In Proceedings of 20th International Society for Music Information Retrieval Conference, Delft, The Netherlands.
  36. 36Peeters, G., & Fort, K. (2012). Towards a (better) definition of annotated MIR corpora. In Proceedings of 13th International Society for Music Information Retrieval Conference, Porto, Portugal.
  37. 37Ramona, M., Richard, G., & David, B. (2008). Vocal detection in music with support vector machines. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. DOI: 10.1109/ICASSP.2008.4518002
  38. 38Rivest, R. (1992). The MD5 message-digest algorithm. RFC 1321, Internet Engineering Task Force Network Working Group. DOI: 10.17487/rfc1321
  39. 39Schlüter, J., & Grill, T. (2015). Exploring Data Augmentation for Improved Singing Voice Detection with Neural Networks. In Proceedings of 16th International Society for Music Information Retrieval Conference, Malaga, Spain.
  40. 40Settles, B. (2008). Curious Machines: Active Learning with Structured Instances. PhD thesis, Stanford University, Music Department.
  41. 41Smith, J. (2013). Correlation Analyses of Encoded Music Performance. PhD thesis, Stanford University, Music Department.
  42. 42Stoller, D., Durand, S., & Ewert, S. (2019). End-to-end lyrics alignment for polyphonic music using an audio-to-character recognition model. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 52755279. DOI: 10.1109/ICASSP.2019.8683470
  43. 43Watanabe, S., Hori, T., Le Roux, J., & Hershey, J. (2017). Student-teacher network learning with enhanced features. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 52755279. DOI: 10.1109/ICASSP.2017.7953163
  44. 44Wong, C. H., Szeto, W. M., & Wong, K. H. (2007). Automatic lyrics alignment for Cantonese popular music. Multimedia Systems, 12(4), 307323. DOI: 10.1007/s00530-006-0055-8
  45. 45Wu, C., & Lerch, A. (2017). Automatic drum transcription using the student-teacher learning paradigm with unlabeled music data. In Proceedings of 18th International Society for Music Information Retrieval Conference, Suzhou, China.
  46. 46Yesiler, F., Tralie, C., Correya, A., Furtado Silva, D., Tovstogan, P., Gomez, E., & Serra, X. (2019). Da-TACOS: A dataset for cover song identification and understanding. In Proceedings of 20th International Society for Music Information Retrieval Conference, Delft, The Netherlands.
DOI: https://doi.org/10.5334/tismir.30 | Journal eISSN: 2514-3298
Language: English
Submitted on: Jan 24, 2019
Accepted on: Apr 9, 2020
Published on: Jun 11, 2020
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2020 Gabriel Meseguer-Brocal, Alice Cohen-Hadria, Geoffroy Peeters, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.