References
- Bittner, R. M., Salamon, J., Tierney, M., Mauch, M., Cannam, C., and Bello, J. P. (2014). Medleydb: A multitrack dataset for annotation‑intensive MIR research. In Proceedings of the 15th International Society for Music Information Retrieval Conference (ISMIR), pp. 155–160.
- Böck, S., and Davies, M. E. P. (2020). Deconstruct, analyse, reconstruct: How to improve tempo, beat, and downbeat estimation. In Proceedings of the 21st International Society for Music Information Retrieval Conference (ISMIR), pp. 574–582.
- Böck, S., Korzeniowski, F., Schlüter, J., Krebs, F., and Widmer, G. (2016). madmom: A new python audio and music signal processing library. In Proceedings of the ACM Conference on Multimedia Conference, pp. 1174–1178. ACM.
- Bogdanov, D., Won, M., Tovstogan, P., Porter, A., and Serra, X. (2019). The MTG‑Jamendo dataset for automatic music tagging. In Machine Learning for Music Discovery Workshop, International Conference on Machine Learning, Long Beach, CA, United States.
- Buda, M., Maki, A., and Mazurowski, M. A. (2018). A systematic study of the class imbalance problem in convolutional neural networks. Neural Networks, 106, 249–259.
- Cano, P., Gómez, E., Gouyon, F., Herrera, P., Koppenberger, M., Ong, B., Serra, X., Streich, S., and Wack, N. (2006). ISMIR 2004 Audio Description Contest. Music Technology Group of the Universitat Pompeu Fabra.
- Dannenberg, R. B. (2006). The interpretation of MIDI velocity. In Proceedings of the International Computer Music Conference (ICMC). Michigan Publishing.
- Dittmar, C., and Gärtner, D. (2014). Real‑time transcription and separation of drum recordings based on NMF decomposition. In Proceedings of the 17th International Conference on Digital Audio Effects (DAFx‑14) (pp. 187–194).
- Fabbro, G., Uhlich, S., Lai, C., Choi, W., Ramírez, M. A. M., Liao, W., Gadelha, I., Ramos, G., Hsu, E., Rodrigues, H., Stöter, F., Défossez, A., Luo, Y., Yu, J., Chakraborty, D., Mohanty, S. P., Solovyev, R. A., Stempkovskiy, A. L., Habruseva, T., . . . Mitsufuji, Y. (2024). The sound demixing challenge 2023 ‑ music demixing track. Transactions of the International Society for Music Information Retrieval, 7(1), 63–84.
- Gillet, O., and Richard, G. (2006). ENST‑drums: An extensive audio‑visual database for drum signals processing. In Proceedings of the 7th International Society for Music Information Retrieval Conference (ISMIR), pp. 156–159.
- Gillick, J., Roberts, A., Engel, J., Eck, D., and Bamman, D. (2019). Learning to groove with inverse sequence transformations. In Proceedings of the International Conference on Machine Learning (ICML).
- Goto, M., Hashiguchi, H., Nishimura, T., and Oka, R. (2002). RWC music database: Popular, classical and jazz music databases. In Proceedings of the 3rd International Society for Music Information Retrieval Conference (ISMIR).
- Ishizuka, R., Nishikimi, R., Nakamura, E., and Yoshii, K. (2020). Tatum‑level drum transcription based on a convolutional recurrent neural network with language model‑based regularized training. In Proceedings of the Asia‑Pacific Signal and Information Processing Association Annual Summit and Conference, pp. 359–364. IEEE.
- Jacques, C., and Roebel, A. (2019). Data augmentation for drum transcription with convolutional neural networks. In Proceedings of the 27th European Signal Processing Conference (EUSIPCO), pp. 1–5. IEEE.
- Jeon, C., Wichern, G., Germain, F. G., and Roux, J. L. (2024). Why does music source separation benefit from cacophony? In Proceedings of the 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 873–877. IEEE.
- Manilow, E., Wichern, G., Seetharaman, P., and Le Roux, J. (2019). Cutting music source separation some Slakh: A dataset to study the impact of training data quality and quantity. In Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 45–49. IEEE.
- Mezza, A. I., Giampiccolo, R., Bernardini, A., and Sarti, A. (2024). Toward deep drum source separation. Pattern Recognition Letters, 183, 86–91.
- Özer, Y., and Müller, M. (2024). Source separation of piano concertos using musically motivated augmentation techniques. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 32, 1214–1225.
- Park, D. S., Chan, W., Zhang, Y., Chiu, C., Zoph, B., Cubuk, E. D., and Le, Q. V. (2019). SpecAugment: A simple data augmentation method for automatic speech recognition. In Proceedings of the 20th Annual Conference of the International Speech Communication Association (Interspeech), pp. 2613–2617. ISCA.
- Raffel, C. (2016). Learning‑Based Methods for Comparing Sequences, With Applications to Audio‑to‑MIDI Alignment and Matching [PhD thesis]. Columbia University.
- Raffel, C., McFee, B., Humphrey, E. J., Salamon, J., Nieto, O., Liang, D., and Ellis, D. P. W. (2014). MIR_EVAL: A transparent implementation of common MIR metrics. In Proceedings of the 15th International Society for Music Information Retrieval Conference (ISMIR), pp. 367–372.
- Rafii, Z., Liutkus, A., Stöter, F.‑R., Mimilakis, S. I., and Bittner, R. (2017). The MUSDB18 corpus for music separation. arXiv:1703.04178.
- Rouard, S., Massa, F., and Défossez, A. (2023). Hybrid transformers for music source separation. In Proceedings of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5. IEEE.
- Salamon, J., Bittner, R. M., Bonada, J., Bosch, J. J., Gómez, E., and Bello, J. P. (2017). An analysis/synthesis framework for automatic F0 annotation of multitrack datasets. In Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR), pp. 71–78.
- Southall, C., Wu, C.‑W., Lerch, A., and Hockman, J. (2017). MDB drums: An annotated subset of MedleyDB for automatic drum transcription. arXiv:1710.01813.
- Strahl, S., and Müller, M. (2024). Semi‑supervised piano transcription using pseudo‑labeling techniques. In Proceedings of the 25th International Society for Music Information Retrieval Conference (ISMIR), pp. 173–181.
- Vogl, R., Dorfer, M., Widmer, G., and Knees, P. (2017). Drum transcription via joint beat and drum modeling using convolutional recurrent neural networks. In Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR), pp. 150–157.
- Vogl, R., Widmer, G., and Knees, P. (2018). Towards multi‑instrument drum transcription. In Proceedings of the 21st International Conference on Digital Audio Effects (DAFx‑18).
- Weber, P., Uhle, C., Müller, M., and Lang, M. (2024). Real‑time automatic drum transcription using dynamic few‑shot learning. In Proceedings of the 5th International Symposium on the Internet of Sounds (IS2). IEEE.
- Wei, I., Wu, C., and Su, L. (2021). Improving automatic drum transcription using large‑scale audio‑to‑midi aligned data. In Proceedings of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 246–250. IEEE.
- Wu, C., Dittmar, C., Southall, C., Vogl, R., Widmer, G., Hockman, J., Müller, M., and Lerch, A. (2018). A review of automatic drum transcription. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(9), 1457–1483.
- Wu, C., and Lerch, A. (2018). From labeled to unlabeled data ‑ on the data challenge in automatic drum transcription. In Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), pp. 445–452.
- Yang, X., Song, Z., King, I., and Xu, Z. (2023). A survey on deep semi‑supervised learning. IEEE Transactions on Knowledge and Data Engineering, 35(9), 8934–8954.
- Zehren, M., Alunno, M., and Bientinesi, P. (2021). ADTOF: A large dataset of non‑synthetic music for automatic drum transcription. In Proceedings of the 22nd International Society for Music Information Retrieval Conference (ISMIR), pp. 818–824.
- Zehren, M., Alunno, M., and Bientinesi, P. (2023). High‑quality and reproducible automatic drum transcription from crowdsourced data. Signals, 4(4), 768–787.
- Zehren, M., Alunno, M., and Bientinesi, P. (2024). Analyzing and reducing the synthetic‑to‑real transfer gap in music information retrieval: The task of automatic drum transcription. CoRR, abs/2407.19823.
- Zhang, C.‑B., Jiang, P.‑T., Hou, Q., Wei, Y., Han, Q., Li, Z., and Cheng, M.‑M. (2021). Delving deep into label smoothing. IEEE Transactions on Image Processing, 30, 5984–5996.
