References
- 1Balke, S., Achankunju, S. P., & Müller, M. (2015). Matching musical themes based on noisy OCR and OMR input. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 703–707. Brisbane, Australia. DOI: 10.1109/ICASSP.2015.7178060
- 2Balke, S., Arifi-Müller, V., Lamprecht, L., & Müller, M. (2016). Retrieving audio recordings using musical themes. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 281–285. Shanghai, China. DOI: 10.1109/ICASSP.2016.7471681
- 3Böck, S., & Schedl, M. (2012). Polyphonic piano note transcription with recurrent neural networks. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 121–124. Kyoto, Japan. DOI: 10.1109/ICASSP.2012.6287832
- 4Byrd, D., & Simonsen, J. G. (2015). Towards a standard testbed for optical music recognition: Definitions, metrics, and page images. Journal of New Music Research, 44(3), 169–195. DOI: 10.1080/09298215.2015.1045424
- 5Casey, M. A., Rhodes, C., & Slaney, M. (2008). Analysis of minimum distances in high-dimensional musical spaces. IEEE Transactions on Audio, Speech, and Language Processing, 16(5), 1015–1028. DOI: 10.1109/TASL.2008.925883
- 6Cheng, T., Mauch, M., Benetos, E., & Dixon, S. (2016). An attack/decay model for piano transcription. In Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR), 584–590. New York City, United States.
- 7Clevert, D., Unterthiner, T., & Hochreiter, S. (2015). Fast and accurate deep network learning by exponential linear units (ELUs). International Conference on Learning Representations (ICLR) (arXiv:1511.07289).
- 8Dorfer, M., Arzt, A., & Widmer, G. (2016). Towards score following in sheet music images. In Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR), 789–795. New York City, United States.
- 9Dorfer, M., Arzt, A., & Widmer, G. (2017a). Learning audio-sheet music correspondences for score identification and offline alignment. In Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR), 115–122. Suzhou, China.
- 10Dorfer, M., Hajič, J.,
Jr. , & Widmer, G. (2017b). On the Potential of Fully Convolutional Neural Networks for Musical Symbol Detection. In Proceedings of the 12th IAPR International Workshop on Graphics Recognition, 53–54. Kyoto, Japan. DOI: 10.1109/ICDAR.2017.274 - 11Dorfer, M., Schlüter, J., Vall, A., Korzeniowski, F., & Widmer, G. (2018). End-to-end cross-modality retrieval with CCA projections and pairwise ranking loss. International Journal of Multimedia Information Retrieval, 7(2), 117–128. DOI: 10.1007/s13735-018-0151-5
- 12Fremerey, C., Clausen, M., Ewert, S., & Müller, M. (2009). Sheet music-audio identification. In Proceedings of the 10th International Society for Music Information Retrieval Conference (ISMIR), 645–650. Kobe, Japan.
- 13Gallego, A.-J., & Calvo-Zaragoza, J. (2017). Staffline removal with selectional auto-encoders. Expert Systems with Applications, 89, 138–148. DOI: 10.1016/j.eswa.2017.07.002
- 14Grachten, M., Gasser, M., Arzt, A., & Widmer, G. (2013). Automatic alignment of music performances with structural differences. In Proceedings of the 14th International Society for Music Information Retrieval Conference (ISMIR), 607–612. Curitiba, Brazil.
- 15Hajič, J.,
Jr. , & Pecina, P. (2017). The MUSCIMA++ Dataset for Handwritten Optical Music Recognition. In 14th International Conference on Document Analysis and Recognition (ICDAR), 39–46. New York, United States. - 16Henriques, J. F., Carreira, J., Caseiro, R., & Batista, J. (2013). Beyond hard negative mining: Efficient detector learning via block-circulant decomposition. In IEEE International Conference on Computer Vision (ICCV), 2760–2767. Sydney, Australia. DOI: 10.1109/ICCV.2013.343
- 17Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning (ICML), 448–456. Lille, France.
- 18Izmirli, Ö., & Sharma, G. (2012). Bridging printed music and audio through alignment using a midlevel score representation. In Proceedings of the 13th International Society for Music Information Retrieval Conference (ISMIR), 61–66. Porto, Portugal.
- 19Kelz, R., Dorfer, M., Korzeniowski, F., Böck, S., Arzt, A., & Widmer, G. (2016). On the potential of simple framewise approaches to piano transcription. In Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR), 475–481. New York City, United States.
- 20Kingma, D., & Ba, J. (2015). Adam: A method for stochastic optimization. International Conference on Learning Representations (ICLR) (arXiv:1412.6980).
- 21Kiros, R., Salakhutdinov, R., & Zemel, R. S. (2014). Unifying visual-semantic embeddings with multimodal neural language models. arXiv preprint (arXiv:1411.2539).
- 22Kurth, F., Müller, M., Fremerey, C., Chang, Y., & Clausen, M. (2007). Automated synchronization of scanned sheet music with audio recordings. In Proceedings of the 8th International Conference on Music Information Retrieval (ISMIR), 261–266. Vienna, Austria.
- 23Lin, M., Chen, Q., & Yan, S. (2014). Network in network. International Conference on Learning Representations (ICLR) (arXiv:1312.4400).
- 24McFee, B., Humphrey, E. J., & Bello, J. P. (2015). A software framework for musical data augmentation. In Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR), 248–254. Málaga, Spain.
- 25Rebelo, A., Fujinaga, I., Paszkiewicz, F., Marcal, A. R. S., Guedes, C., & Cardoso, J. S. (2012). Optical music recognition: state-of-the-art and open issues. International Journal of Multimedia Information Retrieval, 1(3), 173–190. DOI: 10.1007/s13735-012-0004-6
- 26Ronneberger, O., Fischer, P., & Brox, T. (2015).
Unet: Convolutional networks for biomedical image segmentation . In Medical Image Computing and Computer-Assisted Intervention (MICCAI), 234–241. Munich, Germany. - 27Rosasco, L., Vito, E. D., Caponnetto, A., Piana, M., & Verri, A. (2004). Are loss functions all the same? Neural Computation, 16(5), 1063–1076. DOI: 10.1162/089976604773135104
- 28Sigtia, S., Benetos, E., & Dixon, S. (2016). An end-to-end neural network for polyphonic piano music transcription. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(5), 927–939.
- 29Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. International Conference on Learning Representations (ICLR) (arXiv:1409.1556).
- 30Socher, R., Karpathy, A., Le, Q. V., Manning, C. D., & Ng, A. Y. (2014). Grounded compositional semantics for finding and describing images with sentences. Transactions of the Association for Computational Linguistics, 2, 207–218.
- 31Wen, C., Rebelo, A., Zhang, J., & Cardoso, J. (2015). A new optical music recognition system based on combined neural network. Pattern Recognition Letters, 58, 1–7. DOI: 10.1016/j.patrec.2015.02.002
