Have a personal or library account? Click to login
Investigating the Perceptual Validity of Evaluation Metrics for Automatic Piano Music Transcription Cover

Investigating the Perceptual Validity of Evaluation Metrics for Automatic Piano Music Transcription

Open Access
|Jun 2020

Abstract

Automatic Music Transcription (AMT) is usually evaluated using low-level criteria, typically by counting the number of errors, with equal weighting. Yet, some errors (e.g. out-of-key notes) are more salient than others. In this study, we design an online listening test to gather judgements about AMT quality. These judgements take the form of pairwise comparisons of transcriptions of the same music by pairs of different AMT systems. We investigate how these judgements correlate with benchmark metrics, and find that although they match in many cases, agreement drops when comparing pairs with similar scores, or pairs of poor transcriptions. We show that onset-only notewise F-measure is the benchmark metric that correlates best with human judgement, all the more so with higher onset tolerance thresholds. We define a set of features related to various musical attributes, and use them to design a new metric that correlates significantly better with listeners’ quality judgements. We examine which musical aspects were important to raters by conducting an ablation study on the defined metric, highlighting the importance of the rhythmic dimension (tempo, meter). We make the collected data entirely available for further study, in particular to evaluate the perceptual relevance of new AMT metrics.

DOI: https://doi.org/10.5334/tismir.57 | Journal eISSN: 2514-3298
Language: English
Submitted on: Mar 1, 2020
Accepted on: Apr 20, 2020
Published on: Jun 12, 2020
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2020 Adrien Ycart, Lele Liu, Emmanouil Benetos, Marcus T. Pearce, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.