Have a personal or library account? Click to login
Learning Audio–Sheet Music Correspondences for Cross-Modal Retrieval and Piece Identification Cover

Learning Audio–Sheet Music Correspondences for Cross-Modal Retrieval and Piece Identification

Open Access
|Sep 2018

Abstract

This work addresses the problem of matching musical audio directly to sheet music, without any higher-level abstract representation. We propose a method that learns joint embedding spaces for short excerpts of audio and their respective counterparts in sheet music images, using multimodal convolutional neural networks. Given the learned representations, we show how to utilize them for two sheet-music-related tasks: (1) piece/score identification from audio queries and (2) retrieving relevant performances given a score as a search query. All retrieval models are trained and evaluated on a new, large scale multimodal audio–sheet music dataset which is made publicly available along with this article. The dataset comprises 479 precisely annotated solo piano pieces by 53 composers, for a total of 1,129 pages of music and about 15 hours of aligned audio, which was synthesized from these scores. Going beyond this synthetic training data, we carry out first retrieval experiments using scans of real sheet music of high complexity (e.g., nearly the complete solo piano works by Frederic Chopin) and commercial recordings by famous concert pianists. Our results suggest that the proposed method, in combination with the large-scale dataset, yields retrieval models that successfully generalize to data way beyond the synthetic training data used for model building.

DOI: https://doi.org/10.5334/tismir.12 | Journal eISSN: 2514-3298
Language: English
Submitted on: Jan 25, 2018
Accepted on: Mar 20, 2018
Published on: Sep 4, 2018
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2018 Matthias Dorfer, Jan Hajič jr., Andreas Arzt, Harald Frostel, Gerhard Widmer, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.