PiJAMA: Piano Jazz with Automatic MIDI Annotations

Drew Edwards; Simon Dixon; Emmanouil Benetos

doi:10.5334/tismir.162

Figures & Tables

Table 1

Overview of automatic piano transcription techniques and their performance on the datasets MAPS and MAESTRO. For Hawthorne et al. (2019), the MAPS results are from a training configuration with data augmentation, and the MAESTRO results are without augmentation. For Kong et al. (2021), the MAPS results were evaluated with the published checkpoint and the MAESTRO results are the published numbers. Note that this model was trained without data augmentation.

	MAPS			MAESTRO (V1)
MODEL	FRAME F1	ONSET F1	ON+OFFSET F1	FRAME F1	ONSET F1	ON+OFFSET F1
Sigtia et al. (2016)	72.22	46.58	18.38	–	–	–
Hawthorne et al. (2018)	78.30	82.29	50.22	–	–	–
Hawthorne et al. (2019)	84.91	86.44	67.43	90.15	95.32	80.50
Kong et al (2021)	82.78	82.40	56.59	89.71	96.76	82.47
Hawthorne et al. (2021)	–	–	–	88.00	95.95	83.46

Table 2

Evaluation on transcribed solo jazz piano performances. Due to varying quality in the transcriptions, we report metrics for both 50- and 100-millisecond note onset tolerance. The results on RWC Jazz and Jazz Web show little improvement from the increased tolerance, whereas the metrics on the human labeled evaluation sets show significant improvement, suggesting greater misalignment in these sources.

DATASET	#	HAWTHORNE ET AL.		KONG ET AL.
DATASET	#	NOTE F1 (50MS)	NOTE F1 (100MS)	NOTE F1 (50MS)	NOTE F1 (100MS)
RWC Jazz	4	0.932	0.938	0.909	0.910
Jazz Web	5	0.956	0.959	0.926	0.926
Joe Bagg	5	0.876	0.912	0.806	0.858
Daan Schreuder	8	0.889	0.910	0.865	0.881
per recording average	22	0.908	0.925	0.873	0.891

Diagram of the data collection process for the PiJAMA dataset. Stages with a filtering effect are represented with an arrow block symbol.

Scatter plots depicting the relationship between transcription agreement and note onset F1 score. Each data point is computed from a performance in the MAPS test set.

Pitch histogram of all note events in the PiJAMA dataset.

Pitch histograms from pianists Jessica Williams (above) and Erroll Garner (below).

Table 3

Most frequently repeated compositions in the PiJAMA dataset.

FREQUENCY	COMPOSITION(S)
17	Body and Soul
13	All the Things You Are, Yesterdays
12	Sophisticated Lady
11	’Round Midnight
10	Blue Monk
9	Alone Together, Prelude to a Kiss, Sweet and Lovely
8	Someday My Prince Will Come, Jitterbug Waltz, Night and Day, My Funny Valentine, Darn That Dream, Someone to Watch Over Me, Don’t Blame Me, Blue Bolero, I Should Care, Lush Life, Everything Happens to Me, In a Sentimental Mood, Con Alma

Histogram grouping the number of artists by their duration of performance data, in half-hour increments. One pianist (Dick Hyman) is an outlier with over 18 hours of solo piano recordings.

Total performance duration for each artist in the PiJAMA-30 subset.

Bar plot of mean sliding pitch class entropy.

Table 4

Accuracy of artist prediction models. Two test scores are presented for each model condition: the accuracy on the track-split (all tracks of the dataset shuffled into an 80-10-10 split) and the average accuracy across three album-splits (one random album held out for each artist, yielding roughly an 80-10-10 split). The Album Effect column is the difference between accuracies on the track-split and average album-split.

MODEL CONDITION	SPLIT	TEST ACCURACY	ALBUM EFFECT
Spectrogram CRNN	$\frac{Track}{Album}$	$\frac{0.914}{0.267}$	0.647
Spectrogram CRNN (Data Augmentation)	$\frac{Track}{Album}$	$\frac{0.782}{0.399}$	0.383
Transcription Feature CRNN	$\frac{Track}{Album}$	$\frac{0.632}{0.457}$	0.176
Transcription Feature CRNN (Data Augmentation)	$\frac{Track}{Album}$	$\frac{0.629}{0.545}$	0.085
Piano Roll CRNN	$\frac{Track}{Album}$	$\frac{0.556}{0.502}$	0.055

PiJAMA: Piano Jazz with Automatic MIDI Annotations

Figures & Tables

Table 1

Table 2

Figure 1

Figure 2

Figure 3

Figure 4

Table 3

Figure 5

Figure 6

Figure 7

Figure 8

Table 4

Paradigm

My account