Table 1
Overview of automatic piano transcription techniques and their performance on the datasets MAPS and MAESTRO. For Hawthorne et al. (2019), the MAPS results are from a training configuration with data augmentation, and the MAESTRO results are without augmentation. For Kong et al. (2021), the MAPS results were evaluated with the published checkpoint and the MAESTRO results are the published numbers. Note that this model was trained without data augmentation.
Table 2
Evaluation on transcribed solo jazz piano performances. Due to varying quality in the transcriptions, we report metrics for both 50- and 100-millisecond note onset tolerance. The results on RWC Jazz and Jazz Web show little improvement from the increased tolerance, whereas the metrics on the human labeled evaluation sets show significant improvement, suggesting greater misalignment in these sources.
| DATASET | # | HAWTHORNE ET AL. | KONG ET AL. | ||
|---|---|---|---|---|---|
| NOTE F1 (50MS) | NOTE F1 (100MS) | NOTE F1 (50MS) | NOTE F1 (100MS) | ||
| RWC Jazz | 4 | 0.932 | 0.938 | 0.909 | 0.910 |
| Jazz Web | 5 | 0.956 | 0.959 | 0.926 | 0.926 |
| Joe Bagg | 5 | 0.876 | 0.912 | 0.806 | 0.858 |
| Daan Schreuder | 8 | 0.889 | 0.910 | 0.865 | 0.881 |
| per recording average | 22 | 0.908 | 0.925 | 0.873 | 0.891 |

Figure 1
Diagram of the data collection process for the PiJAMA dataset. Stages with a filtering effect are represented with an arrow block symbol.

Figure 2
Scatter plots depicting the relationship between transcription agreement and note onset F1 score. Each data point is computed from a performance in the MAPS test set.

Figure 3
Pitch histogram of all note events in the PiJAMA dataset.

Figure 4
Pitch histograms from pianists Jessica Williams (above) and Erroll Garner (below).
Table 3
Most frequently repeated compositions in the PiJAMA dataset.
| FREQUENCY | COMPOSITION(S) |
|---|---|
| 17 | Body and Soul |
| 13 | All the Things You Are, Yesterdays |
| 12 | Sophisticated Lady |
| 11 | ’Round Midnight |
| 10 | Blue Monk |
| 9 | Alone Together, Prelude to a Kiss, Sweet and Lovely |
| 8 | Someday My Prince Will Come, Jitterbug Waltz, Night and Day, My Funny Valentine, Darn That Dream, Someone to Watch Over Me, Don’t Blame Me, Blue Bolero, I Should Care, Lush Life, Everything Happens to Me, In a Sentimental Mood, Con Alma |

Figure 5
Histogram grouping the number of artists by their duration of performance data, in half-hour increments. One pianist (Dick Hyman) is an outlier with over 18 hours of solo piano recordings.

Figure 6
Total performance duration for each artist in the PiJAMA-30 subset.

Figure 7
Bar plot of notes-per-second.

Figure 8
Bar plot of mean sliding pitch class entropy.
Table 4
Accuracy of artist prediction models. Two test scores are presented for each model condition: the accuracy on the track-split (all tracks of the dataset shuffled into an 80-10-10 split) and the average accuracy across three album-splits (one random album held out for each artist, yielding roughly an 80-10-10 split). The Album Effect column is the difference between accuracies on the track-split and average album-split.
| MODEL CONDITION | SPLIT | TEST ACCURACY | ALBUM EFFECT |
|---|---|---|---|
| Spectrogram CRNN | 0.647 | ||
| Spectrogram CRNN (Data Augmentation) | 0.383 | ||
| Transcription Feature CRNN | 0.176 | ||
| Transcription Feature CRNN (Data Augmentation) | 0.085 | ||
| Piano Roll CRNN | 0.055 |
