Table 1
Comparison of major symbolic piano performance datasets and PianoCoRe dataset with its tiers. Sources: R—recorded (Disklavier/Hardware), T—transcribed (Audio‑to‑MIDI), T‑HQ—transcribed labeled as high quality. Metadata: P—performer, S—piano solo probability, D—deduplication flag, Q—quality label. †Annotations are not available for all performances. ‡Number of unique composer names computed from raw metadata, not manually verified.
| Dataset | Composers | Pieces | Performances | Hours | Sources | Scores | Alignments | Metadata |
|---|---|---|---|---|---|---|---|---|
| MAESTRO | 43 | – | 1,276 | 199 | R | no | no | P |
| (n)ASAP | 16 | 222 | 1,067 | 92 | R | 100% | beat/note | P |
| GiantMIDI | 2,786 | 10,855 | 10,855 | 1,237 | T | no | no | S |
| ATEPP | 25 | 1,596 | 11,742 | 1,009 | T | 43.6% | no | P, Q† |
| Aria‑MIDI | 19,021‡ | – | 1,186,253 | 100,629 | T | no | no | S, P† |
| PERiScoPe | 82 | 2,738 | 46,473 | 3,784 | R, T | 81.9% | note | P† |
| PianoCoRe‑C | 483 | 5,625 | 250,046 | 21,763 | R, T | 75.3% | no | P† |
| PianoCoRe‑B | 478 | 5,591 | 214,092 | 18,757 | R, T | 75.0% | no | P†, D, Q |
| PianoCoRe‑A | 151 | 1,591 | 157,207 | 12,509 | R, T | 100% | note | P†, D, Q |
| PianoCoRe‑A* | 137 | 1,517 | 130,275 | 10,330 | R, T‑HQ | 100% | note | P†, D, Q |

Figure 1
The three‑stage data matching and annotation pipeline used to create PianoCoRe dataset.

Figure 2
Statistical overview of the PianoCoRe‑C dataset for the 50 most represented composers. Top: The total number of unique pieces per composer (blue) and the number of pieces with a musical score (light blue). Bottom: The average number of performances per piece, accumulated by the MIDI source.

Figure 3
Distribution of the number of musical pieces by the number of performances in PianoCoRe‑C.

Figure 4
MIDI performances from ASAP (orange) and ATEPP (blue) grouped by original labels and mapped as a function of performance‑to‑score note ratio and adjusted alignment ratio .
Table 2
Distribution of MIDI quality labels computed using the alignment‑based heuristics for the deduplicated, aligned performances in PianoCoRe‑B.
| HQ | LQ | C | No Label |
|---|---|---|---|
| 170,312 | 4,545 | 140 | 40,597 |
Table 3
MIDI quality classification dataset splits.
| S | HQ | LQ | C | |
|---|---|---|---|---|
| training | 2,500 | 2,500 | 2,500 | 2,500 |
| real | 953 | 2,500 | 1,000 | 86 |
| synth | 1,547 | 0 | 1,500 | 2,414 |
| test | 200 | 200 | 200 | 54 |
| calibration | 662 | 6,525 | 893 | 54 |
Table 4
Evaluation of MIDI quality classifiers using F1 scores. Best scores in bold. no synth—no synthetic training data, mean—mean pooling (no [CLS]), no TL—no transformer layer before the classifier head, no MLM—token embeddings and classifier only. The last block shows feature‑masking ablations.
| Model | S | HQ | LQ | C | Avg. |
|---|---|---|---|---|---|
| base | 1.000 | 0.839 | 0.777 | 0.946 | 0.891 |
| no synth | 1.000 | 0.759 | 0.778 | 0.946 | 0.871 |
| mean | 1.000 | 0.828 | 0.752 | 0.881 | 0.865 |
| mean, no TL | 0.993 | 0.802 | 0.713 | 0.851 | 0.840 |
| no MLM | 0.995 | 0.773 | 0.667 | 0.842 | 0.819 |
| mask Pitch | 1.000 | 0.803 | 0.723 | 0.913 | 0.860 |
| mask Timing | 0.990 | 0.788 | 0.747 | 0.851 | 0.844 |
| mask Velocity | 1.000 | 0.834 | 0.776 | 0.893 | 0.876 |
Table 5
PianoCoRe dataset and its source subsets labeled by the MIDI quality classifier.
| Source | S | HQ | LQ | C |
|---|---|---|---|---|
| ASAP | 0 | 1,066 | 0 | 0 |
| ATEPP | 0 | 10,231 | 900 | 433 |
| GiantMIDI | 11 | 2,071 | 52 | 5 |
| PERiScoPe | 82 | 34,596 | 91 | 4 |
| Aria‑MIDI | 1,151 | 180,977 | 18,359 | 17 |
| PianoCoRe | 1,244 | 228,941 | 19,402 | 459 |

Figure 5
Real‑world alignment challenges motivating the RAScoP pipeline. Top: local timing errors (crossed links) and missing/extra notes. Bottom: large structural deviation from a missing score segment, causing incorrect links. Other performed notes remain usable. Alignments were computed with Parangonar.

Figure 6
Note‑level alignment and the RAScoP pipeline for alignment refinement. The processing steps are demonstrated using an artificial example containing all types of errors. Score notes are drawn in black and performance notes are drawn in blue and green.

Figure 7
Distribution of inter‑onset deviations and beat tempos for alignments before processing (‑), after hole processing (H), after onset cleaning (O), and after both hole and onset cleaning (H + O).
Table 6
Mean alignment recall after different alignment refinement stages and the ratio of sequences (%) inside different recall bands.
| Raw | After H | After H+O | ||||
|---|---|---|---|---|---|---|
| Band | % | % | % | |||
| 0.95–1.00 | 0.975 | 54.3 | 0.975 | 53.9 | 0.973 | 42.9 |
| 0.90–0.95 | 0.929 | 26.6 | 0.929 | 26.7 | 0.928 | 30.4 |
| 0.85–0.90 | 0.879 | 10.1 | 0.878 | 10.0 | 0.878 | 13.3 |
| 0.80–0.85 | 0.828 | 4.7 | 0.828 | 4.6 | 0.828 | 6.5 |
| 0.75–0.80 | 0.779 | 2.1 | 0.778 | 2.2 | 0.777 | 3.2 |
| 0.70–0.75 | 0.725 | 1.1 | 0.727 | 1.0 | 0.728 | 1.6 |
| 0.60–0.70 | 0.660 | 0.7 | 0.663 | 1.1 | 0.661 | 1.5 |
| 0.00–0.60 | 0.471 | 0.4 | 0.464 | 0.5 | 0.462 | 0.6 |
| all | 0.935 | 100.0 | 0.934 | 100.0 | 0.920 | 100.0 |

Figure 8
Validation loss curves for PianoFlow trained on different subsets of the data. Larger and refined training datasets reduce overfitting in the long run.
Table 7
Correlation between the features of the rendered and PianoCoRe‑A performances. First row—intra‑set correlations, other rows—models trained on different data subsets. Vel—velocity, IOI—inter‑onset‑interval, OD—relative onset deviation, Art—sustained articulation. The best scores are in bold.
| Vel | IOI | OD | Art | |
|---|---|---|---|---|
| Dataset | 0.57±0.19 | 0.90±0.06 | 0.22±0.17 | 0.44±0.19 |
| ASAP | 0.37±0.17 | 0.83±0.11 | 0.07±0.15 | 0.28±0.13 |
| + ATEPP | 0.42±0.16 | 0.85±0.11 | 0.12±0.14 | 0.35±0.15 |
| + PERiScoPe | 0.41±0.17 | 0.86±0.11 | 0.11±0.17 | 0.36±0.17 |
| PianoCoRe‑A | 0.40±0.17 | 0.86±0.11 | 0.10±0.17 | 0.35±0.17 |
| 0.39±0.16 | 0.85±0.11 | 0.09±0.16 | 0.35±0.18 | |
| w/o RAScoP | 0.41±0.16 | 0.85±0.11 | 0.09±0.16 | 0.36±0.18 |
Table 8
Conditional performance rendering (performance continuation) results across training subsets and unseen source sequences. Size denotes the training set size. Vel—Velocity (MIDI bins), TS—TimeShift (s), TD—TimeDurationSustain (s). Lower is better; best values are in bold.
| ASAP | ATEPP | PERiScoPe | Aria‑MIDI | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset | Size | Vel | TS | TD | Vel | TS | TD | Vel | TS | TD | Vel | TS | TD |
| ASAP | 1 k | 9.885 | 0.023 | 0.187 | 9.928 | 0.022 | 0.206 | 9.893 | 0.023 | 0.230 | 9.957 | 0.027 | 0.275 |
| + ATEPP | 6 k | 9.157 | 0.017 | 0.168 | 8.230 | 0.015 | 0.191 | 8.782 | 0.016 | 0.216 | 8.721 | 0.019 | 0.252 |
| + PERiScoPe | 25 k | 8.851 | 0.016 | 0.154 | 7.888 | 0.013 | 0.189 | 8.117 | 0.015 | 0.192 | 8.133 | 0.017 | 0.230 |
| PianoCoRe‑A | 124 k | 8.613 | 0.016 | 0.155 | 7.967 | 0.014 | 0.194 | 8.094 | 0.015 | 0.194 | 7.872 | 0.017 | 0.205 |
| 141 k | 8.631 | 0.016 | 0.158 | 7.944 | 0.014 | 0.196 | 8.071 | 0.015 | 0.194 | 7.921 | 0.017 | 0.206 | |
| w/o RAScoP | 124 k | 8.734 | 0.017 | 0.159 | 8.059 | 0.015 | 0.193 | 8.199 | 0.016 | 0.196 | 8.055 | 0.018 | 0.211 |
