Have a personal or library account? Click to login
Learning Audio–Sheet Music Correspondences for Cross-Modal Retrieval and Piece Identification Cover

Learning Audio–Sheet Music Correspondences for Cross-Modal Retrieval and Piece Identification

Open Access
|Sep 2018

References

  1. 1Balke, S., Achankunju, S. P., & Müller, M. (2015). Matching musical themes based on noisy OCR and OMR input. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 703707. Brisbane, Australia. DOI: 10.1109/ICASSP.2015.7178060
  2. 2Balke, S., Arifi-Müller, V., Lamprecht, L., & Müller, M. (2016). Retrieving audio recordings using musical themes. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 281285. Shanghai, China. DOI: 10.1109/ICASSP.2016.7471681
  3. 3Böck, S., & Schedl, M. (2012). Polyphonic piano note transcription with recurrent neural networks. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 121124. Kyoto, Japan. DOI: 10.1109/ICASSP.2012.6287832
  4. 4Byrd, D., & Simonsen, J. G. (2015). Towards a standard testbed for optical music recognition: Definitions, metrics, and page images. Journal of New Music Research, 44(3), 169195. DOI: 10.1080/09298215.2015.1045424
  5. 5Casey, M. A., Rhodes, C., & Slaney, M. (2008). Analysis of minimum distances in high-dimensional musical spaces. IEEE Transactions on Audio, Speech, and Language Processing, 16(5), 10151028. DOI: 10.1109/TASL.2008.925883
  6. 6Cheng, T., Mauch, M., Benetos, E., & Dixon, S. (2016). An attack/decay model for piano transcription. In Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR), 584590. New York City, United States.
  7. 7Clevert, D., Unterthiner, T., & Hochreiter, S. (2015). Fast and accurate deep network learning by exponential linear units (ELUs). International Conference on Learning Representations (ICLR) (arXiv:1511.07289).
  8. 8Dorfer, M., Arzt, A., & Widmer, G. (2016). Towards score following in sheet music images. In Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR), 789795. New York City, United States.
  9. 9Dorfer, M., Arzt, A., & Widmer, G. (2017a). Learning audio-sheet music correspondences for score identification and offline alignment. In Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR), 115122. Suzhou, China.
  10. 10Dorfer, M., Hajič, J., Jr., & Widmer, G. (2017b). On the Potential of Fully Convolutional Neural Networks for Musical Symbol Detection. In Proceedings of the 12th IAPR International Workshop on Graphics Recognition, 5354. Kyoto, Japan. DOI: 10.1109/ICDAR.2017.274
  11. 11Dorfer, M., Schlüter, J., Vall, A., Korzeniowski, F., & Widmer, G. (2018). End-to-end cross-modality retrieval with CCA projections and pairwise ranking loss. International Journal of Multimedia Information Retrieval, 7(2), 117128. DOI: 10.1007/s13735-018-0151-5
  12. 12Fremerey, C., Clausen, M., Ewert, S., & Müller, M. (2009). Sheet music-audio identification. In Proceedings of the 10th International Society for Music Information Retrieval Conference (ISMIR), 645650. Kobe, Japan.
  13. 13Gallego, A.-J., & Calvo-Zaragoza, J. (2017). Staffline removal with selectional auto-encoders. Expert Systems with Applications, 89, 138148. DOI: 10.1016/j.eswa.2017.07.002
  14. 14Grachten, M., Gasser, M., Arzt, A., & Widmer, G. (2013). Automatic alignment of music performances with structural differences. In Proceedings of the 14th International Society for Music Information Retrieval Conference (ISMIR), 607612. Curitiba, Brazil.
  15. 15Hajič, J., Jr., & Pecina, P. (2017). The MUSCIMA++ Dataset for Handwritten Optical Music Recognition. In 14th International Conference on Document Analysis and Recognition (ICDAR), 3946. New York, United States.
  16. 16Henriques, J. F., Carreira, J., Caseiro, R., & Batista, J. (2013). Beyond hard negative mining: Efficient detector learning via block-circulant decomposition. In IEEE International Conference on Computer Vision (ICCV), 27602767. Sydney, Australia. DOI: 10.1109/ICCV.2013.343
  17. 17Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning (ICML), 448456. Lille, France.
  18. 18Izmirli, Ö., & Sharma, G. (2012). Bridging printed music and audio through alignment using a midlevel score representation. In Proceedings of the 13th International Society for Music Information Retrieval Conference (ISMIR), 6166. Porto, Portugal.
  19. 19Kelz, R., Dorfer, M., Korzeniowski, F., Böck, S., Arzt, A., & Widmer, G. (2016). On the potential of simple framewise approaches to piano transcription. In Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR), 475481. New York City, United States.
  20. 20Kingma, D., & Ba, J. (2015). Adam: A method for stochastic optimization. International Conference on Learning Representations (ICLR) (arXiv:1412.6980).
  21. 21Kiros, R., Salakhutdinov, R., & Zemel, R. S. (2014). Unifying visual-semantic embeddings with multimodal neural language models. arXiv preprint (arXiv:1411.2539).
  22. 22Kurth, F., Müller, M., Fremerey, C., Chang, Y., & Clausen, M. (2007). Automated synchronization of scanned sheet music with audio recordings. In Proceedings of the 8th International Conference on Music Information Retrieval (ISMIR), 261266. Vienna, Austria.
  23. 23Lin, M., Chen, Q., & Yan, S. (2014). Network in network. International Conference on Learning Representations (ICLR) (arXiv:1312.4400).
  24. 24McFee, B., Humphrey, E. J., & Bello, J. P. (2015). A software framework for musical data augmentation. In Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR), 248254. Málaga, Spain.
  25. 25Rebelo, A., Fujinaga, I., Paszkiewicz, F., Marcal, A. R. S., Guedes, C., & Cardoso, J. S. (2012). Optical music recognition: state-of-the-art and open issues. International Journal of Multimedia Information Retrieval, 1(3), 173190. DOI: 10.1007/s13735-012-0004-6
  26. 26Ronneberger, O., Fischer, P., & Brox, T. (2015). Unet: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention (MICCAI), 234241. Munich, Germany.
  27. 27Rosasco, L., Vito, E. D., Caponnetto, A., Piana, M., & Verri, A. (2004). Are loss functions all the same? Neural Computation, 16(5), 10631076. DOI: 10.1162/089976604773135104
  28. 28Sigtia, S., Benetos, E., & Dixon, S. (2016). An end-to-end neural network for polyphonic piano music transcription. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(5), 927939.
  29. 29Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. International Conference on Learning Representations (ICLR) (arXiv:1409.1556).
  30. 30Socher, R., Karpathy, A., Le, Q. V., Manning, C. D., & Ng, A. Y. (2014). Grounded compositional semantics for finding and describing images with sentences. Transactions of the Association for Computational Linguistics, 2, 207218.
  31. 31Wen, C., Rebelo, A., Zhang, J., & Cardoso, J. (2015). A new optical music recognition system based on combined neural network. Pattern Recognition Letters, 58, 17. DOI: 10.1016/j.patrec.2015.02.002
DOI: https://doi.org/10.5334/tismir.12 | Journal eISSN: 2514-3298
Language: English
Submitted on: Jan 25, 2018
Accepted on: Mar 20, 2018
Published on: Sep 4, 2018
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2018 Matthias Dorfer, Jan Hajič jr., Andreas Arzt, Harald Frostel, Gerhard Widmer, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.