Have a personal or library account? Click to login

Figures & Tables

tismir-6-1-149-g1.png
Figure 1

Differences between note alignments (left) and sequence alignments (right, e.g. produced by DTW). Note alignments can feature unaligned elements and the aligned note pairs are not guaranteed to be strictly ordered in time.

tismir-6-1-149-g2.png
Figure 2

The steps involved in our proposed models: coarse sequence alignment, segmentation, fine-grained sequence alignment, note matching, and segment mending. For anchor point-based models treated in section 4, the first step is replaced by existing anchor points.

tismir-6-1-149-g3.png
Figure 3

Pitch-wise symbolic note matching based on minimal cumulative distance between warped notes. Three performance notes (top row) are matched to two score notes (bottom row). First the score notes are projected to the performance time domain using a time mapping (blue lines). Second the distances of the projected score onsets from all 3C2 2-combinations of the three performance onsets are computed (rows two to four). Finally, the two performance notes minimizing the cumulative distance (red bars) are aligned to the score notes (yellow lines).

Table 1

Computation of precision and recall for a simple case of four notes; score notes sn1, sn2, and performance notes pn1, pn2. m() denotes a match, d() and i() deletions and insertions, respectively.

VALUEEXAMPLE
Predictionm(sn1, pn1), m(sn2,pn2)
Ground truth:d(sn1), i(pn1), m(sn2,pn2)
True Positive:m(sn2, pn2)
False Positive:m(sn1,pn1)
False Negative:d(sn1), i(pn1)
Precision1/2 (= TP/(TP + FP))
Recall1/3 (= TP/(TP + FN))
Table 2

Dataset-wise averaged F-Scores of each model. * Superscripts are not statistically different from Nakamura’s (α = 0.01).

4×22ZEILINGERMAGALOFF
hDTW+sym98.53 %97.98 %*94.57 %*
hNWTW+sym97.38 %95.07 %*90.91 %
Nakamura98.97 %*97.61 %*95.18 %*
Table 3

Hyperparameter grid search values: window size refers to the search space of notes for the greedy algorithm, fuzziness refers to the amount of window overlap (see Section 4.1.1), metric refers to the local distance metric in the time warping algorithms, and γ refers to the gap penalty.

METHODPARAMETERSVALUES
GreedyWindow size:1, 3, 5
LinearFuzziness:0.05n; n ∈ {1,…,20}
DTWFuzziness:0.05n; n ∈ {1,…,20}
Metric:cos, Lp; p ∈ {1,2,4, ∞}
NWTWFuzziness:0.05n; n ∈ {1,…,20}
Metric:cos, Lp; p ∈ {1,2,4, ∞}
γ:0.5, 1.0, 1.5 2.0, 2.5, 3.0
Table 4

Hyperparameters and F-measures of the best performing models on the tuning set.

METHODHYPERPARAMETERSF-MEASURE
GreedyWindow size: 395.43 %
LinearFuzziness: 0.9598.71 %
DTWFuzziness: 0.65, L4-norm98.74 %
NWTWFuzziness: 0.8, γ: 0.5, Cosine98.75 %
Table 5

Values with superscripts are statistically better (*) or worse () than Nakamura’s automatic alignment (α = 0.01), respectively. Bold indicates the best result (or results where the difference is not significant) for each resolution (beats, measures) and dataset.

4×22ZEILINGERMAGALOFF
METHODF-MEASURE (IN %)
NAKAMURA98.9797.6195.18
BeatsGreedy99.2898.0995.68
Linear99.87*99.67*98.87*
DTW99.81*99.48*98.67*
NWTW99.91*99.61*98.78*
MeasuresGreedy97.5996.0190.33
Linear99.2899.30*97.82*
DTW99.31*98.8897.66*
NWTW99.63*99.25*97.88*
tismir-6-1-149-g4.png
Figure 4

F-measure for models with global and beat level alignments. Results are reported on the Magaloff dataset.

tismir-6-1-149-g5.png
Figure 5

Effect of artificially added uniform noise on tapping annotations. Results are computed on the Magaloff dataset for beat-level alignments. The shaded areas indicate ±1 standard deviation from the mean.

Table 6

ASAP dataset statistics: S is the number of scores, P is the number of performances, S-Notes and P-Notes are number of notes in scores and performances, respectively, and Mins is the total duration of performances in minutes.

COMPOSERSPS-NOTESP-NOTESMINS
Bach59169117218321688387
Balakirev1101649013960887
Beethoven6327143170416688731761
Brahms11351416676
Chopin3628923618614103691257
Debussy23108001447013
Glinka124246907410
Haydn124456230190942215
Liszt171211812741192297900
Mozart616337967392778
Prokofiev1894383823133
Rachmaninoff48135522094130
Ravel42232248108519140
Schubert1562134576453464499
Schumann112863593122356129
Scriabin21318342145441125
All2351067136320759118675670
tismir-6-1-149-g6.png
Figure 6

Parangonada visualization of an aligned excerpt of Chopin’s Nocturne Op. 32 No. 2, measures 8–9. The top piano roll represents the performance; the bottom piano roll the score. Lines connect notes aligned by automatic note alignment models. The score is added for clarity and is not part of the interface. Parangonada is not aware of pitch spelling; all black notes are displayed as ♯ even though the piece is in A♭ major.

tismir-6-1-149-g7.png
Figure 7

A histogram of the number of notes performed by composer and the performance statistics of those notes. Pieces from four composers were gathered: Chopin, Bach, Beethoven and Liszt. The left histogram plot shows onset-wise tempo in seconds per beat. The right plot shows a histogram of articulation expressed as a note-wise dimensionless logarithm of played duration divided by notated duration.

tismir-6-1-149-g8.png
Figure 8

Chord spread distribution in seconds for four composers. Chord spread is defined for each chord as the maximal time interval between performance note onsets belonging to the chord. The white dot shows the median, the thick horizontal line the quartiles, and the thin horizontal line the 5- and 95-percentiles.

tismir-6-1-149-g9.png
Figure 9

Performance statistics for four performers on the Scriabin Sonata No. 5, measures 47-52. Left: dynamics (MIDI velocity, normalized to (0,1)); middle: timing (how much onsets of chord notes deviate from their mean, in seconds); right: articulation (how staccato or legato the notes are played; see also Figure 7). The horizontal gray lines indicate quantiles; see also Figure 8.

DOI: https://doi.org/10.5334/tismir.149 | Journal eISSN: 2514-3298
Language: English
Submitted on: Sep 1, 2022
Accepted on: Jun 2, 2023
Published on: Jun 26, 2023
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2023 Silvan David Peter, Carlos Eduardo Cancino-Chacón, Francesco Foscarin, Andrew Philip McLeod, Florian Henkel, Emmanouil Karystinaios, Gerhard Widmer, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.