Have a personal or library account? Click to login
Investigating the Perceptual Validity of Evaluation Metrics for Automatic Piano Music Transcription Cover

Investigating the Perceptual Validity of Evaluation Metrics for Automatic Piano Music Transcription

Open Access
|Jun 2020

Figures & Tables

tismir-3-1-57-g1.png
Figure 1

Screenshot of the listening test website.

Table 1

Benchmark evaluation metrics for all systems, evaluated on the MAPS subsets ENSTDkCl and ENSTDkAm, with best values in bold.

SystemPfRfFfPn,OnRn,OnFn,OnPn,OnOffRn,OnOffFn,OnOff
STF67.260.062.749.832.038.316.511.313.2
CNN80.258.266.177.054.963.233.524.628.0
NMF71.363.366.479.657.065.735.726.430.0
OAF89.079.583.885.984.184.966.965.566.2
tismir-3-1-57-g2.png
Figure 2

Vote proportion in pairwise comparisons of the systems. Blue bars represent the proportion of times the system on the left was chosen over the one on the right. For each pair, the percentage in parentheses is the average Fn,On computed on the specific examples included in the comparison.

tismir-3-1-57-g3.png
Figure 3

Proportion of agreement, across all examples, between raters and various evaluation metrics (Ff with various frame sizes, and Fn,On with various tolerance thresholds).

tismir-3-1-57-g4.png
Figure 4

Proportion of agreement, across all examples, between raters and Fn,OnOff, with various onset and offset tolerance thresholds.

Table 2

Coefficients and p-values for the linear fixed effects model using agreement with Fn,On as dependent variable and features as fixed effects.

FeatureCoefficientP-value
ΔF0.539<0.001
Fbest0.330<0.001
Gold-MSI–0.0070.232
Known0.0140.391
Difficulty–0.044<0.001
tismir-3-1-57-g5.png
Figure 5

Agreement between ratings and Fn,On for each reported difficulty level.

tismir-3-1-57-g6.png
Figure 6

Distribution of difficulty ratings (lightest = 1, darkest = 5) for each pair of systems.

Table 3

Coefficients and p-values for the linear fixed effects model using difficulty as dependent variable and features as fixed effects.

FeatureCoefficientP-value
ΔF–1.564<0.001
Fbest–0.608<0.001
Gold-MSI–0.227<0.001
Known–0.1530.002
Agree–0.423<0.001
Table 4

Coefficients and p-values for the linear fixed effects model using agreement with Fn,On as dependent variable and features as fixed effects, on confident answers only.

FeatureCoefficientP-value
ΔF0.584<0.001
Fbest0.349<0.001
Gold-MSI–0.0140.011
Known0.0020.912
Difficulty–0.036<0.001
tismir-3-1-57-g7.png
Figure 7

Proportion of agreement depending on the difference in Fn,On between the two options, computed on confident answers only.

Table 5

Coefficients and p-values for the linear fixed effects model using agreement among raters as dependent variable and features as fixed effects.

FeatureCoefficientP-value
ΔF0.496<0.001
Fbest–0.0920.423
Gold-MSIavg–0.0710.004
Gold-MSIstd–0.0160.778
Difficultyavg–0.1760.003
tismir-3-1-57-g8.png
Figure 8

Aconf measure for each tested configuration, averaged across folds. The dotted line represents Aconf for Fn,On. Descriptions of each configuration are given in Table 6. Colors represent the p-value when testing whether each metric is different from the “All” configuration. Asterisks represent results significantly different from All (*: p < 0.1, **: p < 0.05, ***: p < 0.01).

Table 6

Description of each tested feature configuration.

ConfigurationRemoved features
AllNone
NoBenchBenchmark metrics
NoFeaturesAll features, except benchmark metrics
NoHighLowMistakes in highest and lowest notes
NoLoudLoudness of false negatives
NoOutKeyOut-of-key false positives
NoRepeatRepeated and merged notes
NoSpecificSpecific pitch mistakes
NoPolyPolyphony level difference
NoRhythmRhythm histogram flatness and rhythm dispersion
NoFramewiseFramewise benchmark metrics, framewise highest and lowest note mistakes, framewise specific pitch errors, polyphony level difference, consonance measures
NoSpecOutSpecific pitch mistakes and out-of-key false positives
DOI: https://doi.org/10.5334/tismir.57 | Journal eISSN: 2514-3298
Language: English
Submitted on: Mar 1, 2020
Accepted on: Apr 20, 2020
Published on: Jun 12, 2020
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2020 Adrien Ycart, Lele Liu, Emmanouil Benetos, Marcus T. Pearce, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.