
Figure 1
Schematic overview of the BPSD. For the first movements of all 32 piano sonatas, the dataset comprises raw data in different representations (versions) such as score images, symbolic score representations, and audio recordings of different performances. For the different versions, we provide time-aligned annotations of measure positions, beats, global and local keys, chords, and structural elements. The score and the four audio versions indicated in dark blue are in the public domain (EU).
Table 1
Overview of the folder structure of the BPSD. Score-based folders contain files named in the format Beethoven_workID.ext, while audio-based folders contain files in the format Beethoven_workID_performerID.ext.
| Folder name | Content |
|---|---|
| - 0_RawData | Raw audio and symbolic data |
| | - audio_ripped | Audio files as ripped from the CD |
| | - WK64 | |
| | ... | |
| | - FG67 | |
| | - score_pdf_scan | Scanned score from IMSLP |
| | - score_pdf_repetitions | Symbolic score in PDF format with repeat signs |
| | - score_pdf_unfolded | Symbolic score in PDF format with unfolded repetitions |
| | - score_sibelius_repetitions | Symbolic score in Sibelius format with repeat signs |
| | - score_sibelius_unfolded | Symbolic score in Sibelius format with unfolded repetitions |
| | - score_xml_repetitions | Symbolic score in MusicXML format with repeat signs |
| | - score_xml_unfolded | Symbolic score in MusicXML format with unfolded repetitions |
| | - score_midi | MIDI export of the symbolic score |
| - 1_Audio | Audio files with coherent structure |
| - 2_Annotations | Annotations with musical and physical timelines |
| | - ann_score_note | Note events with start and end given in musical time |
| | - ann_score_chord | Harmony annotations given in musical time |
| | - ann_score_localkey | Local key annotations given in musical time |
| | - ann_score_globalkey | Global key annotations |
| | - ann_score_structureFine | Fine structure annotations given in musical time |
| | - ann_score_structureCoarse | Coarse structure annotations given in musical time |
| | - ann_audio_note | Note events with start and end given in physical time |
| | - ann_audio_midi | Note events in physical time in MIDI format |
| | - ann_audio_beat | Beat annotations given in physical time |
| | - ann_audio_measure | Measure annotations given in physical time |
| | - ann_audio_startEnd | Start and end of audio recordings (for removing silence/applause) given in physical time |
| | - ann_audio_syncInfo | Alignment tuples for converting between musical and physical timeline |
| | - ann_audio_modifications | Annotations for structural modifications of recordings |
| | - ann_audio_chord | Harmony annotations given in physical time |
| | - ann_audio_localkey | Local key annotations given in physical time |
| | - ann_audio_structureFine | Fine structure annotations given in physical time |
| | - ann_audio_structureCoarse | Coarse structure annotations given in physical time |
| - 3_Scripts | Python scripts to convert raw data into the structured format |
Table 2
Overview of audio recordings in the BPSD. The upper four performances with identifiers WK64, FG58, FJ62, and AS35 are in the public domain and freely accessible within the BPSD. All remaining recordings are commercially available and can be identified using the EAN code. Durations are presented in the format hh:mm:ss.
| ID | Performer | Year | Label | EAN Code | Orig. Dur. | Final Dur. |
|---|---|---|---|---|---|---|
| WK64 | Wilhelm Kempff | 1964 | Deutsche Grammophon | 028944796629 | 03:18:26 | 03:45:31 |
| FJ62 | Fritz Jank | 1962 | Instituto Piano Brasileiro | available at IMSLP | 03:35:13 | 03:41:26 |
| FG58 | Friedrich Gulda | 1958 | Decca | 028948514519 | 03:34:00 | 03:34:00 |
| AS35 | Artur Schnabel | 1935 | Warner Classics | 0190295975050 | 03:31:03 | 03:33:35 |
| MC22 | Muriel Chemin | 2022 | Odradek | 855317003615 | 04:08:22 | 04:05:11 |
| MB97 | Malcolm Bilson et al. | 1997 | Claves | 7619931970721 | 03:52:23 | 03:46:08 |
| AB96 | Alfred Brendel | 1996 | Philips | 028941257529 | 03:54:34 | 03:52:28 |
| JJ90 | Jeno Jando | 1990 | NAXOS | 730099150224 | 03:41:06 | 03:39:14 |
| DB84 | Daniel Barenboim | 1984 | Deutsche Grammophon | 028941375926, 028941376626 | 03:58:37 | 03:58:37 |
| VA81 | Vladimir Ashkenazy | 1981 | London Records | 028944370621 | 03:48:16 | 03:46:27 |
| FG67 | Friedrich Gulda | 1967 | Amadeo | 028947687610 | 03:25:02 | 03:25:02 |
| Total | 40:47:08 | 41:07:45 |
Table 3
Overview of the first movements of Beethoven’s 32 Piano Sonatas. The table displays information including the work ID, trivial name (if applicable), global key, mean, minimum, and maximum duration of available recordings (see Table 2), number of measures, and the coarse structure. All durations are presented in the format mm:ss.
| No. | Work ID | Name | Key | Mean Dur. | Min. Dur. | Max. Dur | Meas. | Structure |
|---|---|---|---|---|---|---|---|---|
| 01 | Op002No1-01 | F:min | 03:47 | 03:22 (AS35) | 04:33 (WK64) | 200 | E-E-D-R | |
| 02 | Op002No2-01 | A:maj | 07:04 | 06:23 (FG67) | 07:45 (MC22) | 452 | E-E-D-R | |
| 03 | Op002No3-01 | C:maj | 10:15 | 09:47 (FG58) | 11:25 (MC22) | 347 | E-E-D-R-C | |
| 04 | Op007-01 | Grand Sonata | Eb:maj | 08:17 | 07:27 (AS35) | 08:58 (MC22) | 497 | E-E-D-R-C |
| 05 | Op010No1-01 | C:min | 05:33 | 04:41 (AS35) | 06:13 (MC22) | 388 | E-E-D-R | |
| 06 | Op010No2-01 | F:maj | 05:38 | 05:03 (FG67) | 06:14 (VA81) | 268 | E-E-D-R | |
| 07 | Op010No3-01 | D:maj | 06:59 | 06:26 (FJ62) | 07:53 (JJ90) | 467 | E-E-D-R-C | |
| 08 | Op013-01 | Pathétique | C:min | 08:56 | 08:06 (FG58) | 09:57 (MC22) | 431 | I-E-E-D-R-C |
| 09 | Op014No1-01 | E:maj | 06:35 | 05:31 (VA81) | 07:25 (AB96) | 222 | E-E-D-R-C | |
| 10 | Op014No2-01 | G:maj | 07:06 | 05:49 (AS35) | 07:56 (AB96) | 263 | E-E-D-R-C | |
| 11 | Op022-01 | Bb:maj | 07:26 | 06:43 (AS35) | 08:36 (MC22) | 267 | E-E-D-R | |
| 12 | Op026-01 | Funeral March | Ab:maj | 08:01 | 06:51 (FG67) | 10:02 (AS35) | 219 | T-V1-V2-V3-V4-V5 |
| 13 | Op027No1-01 | Son. q. u. fant. | Eb:maj | 05:12 | 04:36 (AB96) | 05:42 (FG58) | 106 | An-Al-T1 |
| 14 | Op027No2-01 | Moonlight | C#:min | 06:01 | 04:58 (AS35) | 07:28 (FG58) | 69 | P1-P2-P3-C |
| 15 | Op028-01 | Pastoral | D:maj | 09:58 | 08:58 (FJ62) | 11:39 (MC22) | 622 | E-E-D-R-C |
| 16 | Op031No1-01 | G:maj | 06:23 | 05:44 (FG58) | 07:19 (MC22) | 435 | E-E-D-R-C | |
| 17 | Op031No2-01 | Tempest | D:min | 08:27 | 06:49 (FG58) | 09:52 (MC22) | 320 | E-E-D-R-C |
| 18 | Op031No3-01 | The Hunt | Eb:maj | 08:29 | 07:53 (FG67) | 09:07 (MB97) | 341 | E-E-D-R-C |
| 19 | Op049No1-01 | Easy Sonata | G:min | 04:35 | 03:41 (JJ90) | 05:17 (MB97) | 143 | E-E-D-R-C |
| 20 | Op049No2-01 | Easy Sonata | G:maj | 04:37 | 04:19 (FJ62) | 05:10 (MC22) | 174 | E-E-D-R |
| 21 | Op053-01 | Waldstein | C:maj | 10:38 | 09:25 (FG67) | 11:36 (MC22) | 387 | E-E-D-R-C |
| 22 | Op054-01 | F:maj | 05:38 | 04:58 (AS35) | 06:13 (MC22) | 154 | M1-Tr1-M2-Tr2-M3-C | |
| 23 | Op057-01 | Appassionata | F:min | 09:35 | 07:35 (FG67) | 10:39 (DB84) | 262 | E-D-R-C |
| 24 | Op078-01 | A Thérèse | F#:maj | 07:04 | 06:20 (FG58) | 08:18 (MC22) | 206 | I-E-E-D-R-D-R |
| 25 | Op079-01 | Cuckoo | G:maj | 04:40 | 03:58 (AS35) | 05:12 (MC22) | 372 | E-E-D-R-D-R-C |
| 26 | Op081a-01 | Les adieux | Eb:maj | 07:04 | 06:00 (FG67) | 07:50 (DB84) | 308 | I-E-E-D-R-C |
| 27 | Op090-01 | E:min | 05:35 | 04:34 (FG67) | 06:19 (MB97) | 245 | E-D-R-C | |
| 28 | Op101-01 | A:maj | 04:00 | 03:35 (WK64) | 04:29 (DB84) | 102 | E-D-R-C | |
| 29 | Op106-01 | Hammer-klavier | Bb:maj | 11:06 | 08:54 (AS35) | 13:04 (DB84) | 530 | E-E-D-R-C |
| 30 | Op109-01 | E:maj | 03:46 | 03:14 (WK64) | 04:19 (DB84) | 99 | E-D-R-C | |
| 31 | Op110-01 | Ab:maj | 06:33 | 06:00 (FJ62) | 07:33 (DB84) | 116 | E-D-R-C | |
| 32 | Op111-01 | C:min | 09:05 | 08:20 (AS35) | 10:04 (VA81) | 209 | I-E-E-D-R-C |
Table 4
Accuracy of synchronization approaches. The table presents absolute errors between measure estimates obtained from audio-audio synchronization (based on manually annotated measure positions for WK64) and score-audio synchronization. Mean, median, and the confidence interval for all measures (left side) and for only those measures with a note onset (right side) are reported. All values are given in milliseconds.
| All Measures | Measures With Note Onset | |||||
|---|---|---|---|---|---|---|
| Version | Mean | Median | 95% Conf. | Mean | Median | 95% Conf. |
| WK64 | 20 | 13 | 40 | 14 | 11 | 40 |
| FJ62 | 25 | 19 | 60 | 19 | 18 | 45 |
| FG58 | 23 | 16 | 41 | 17 | 14 | 40 |
| AS35 | 25 | 15 | 54 | 18 | 13 | 40 |
| MC22 | 27 | 20 | 63 | 21 | 20 | 60 |
| MB97 | 30 | 20 | 60 | 20 | 20 | 47 |
| AB96 | 28 | 18 | 46 | 18 | 17 | 40 |
| JJ90 | 24 | 17 | 43 | 16 | 16 | 40 |
| DB84 | 29 | 19 | 60 | 19 | 18 | 54 |
| VA81 | 24 | 17 | 56 | 17 | 16 | 41 |
| FG67 | 25 | 9 | 40 | 17 | 8 | 40 |
| All | 25 | 17 | 53 | 18 | 16 | 40 |

Figure 2
Overview of various annotations in the BPSD illustrated using the first measures of the Sonata Op. 14, No. 2 in G Major. Measure positions are marked with red ticks, while beat positions are indicated by red dashed lines.

Figure 3
Synchronized score-audio training pair for learning pitch-class representations using a frame-wise loss function.

Figure 4
Visualization of a time-diatonic representation derived from the WK64 recording of the first movement of the Piano Sonata Op. 14 No 2 in G Major. The local-key reference annotations are indicated by the overlaid red rectangles.

Figure 5
Cross-version chord recognition for the initial measures of the Piano Sonata Op. 31 No. 1 in G major. The results are presented for all 11 performances, alongside the majority vote and the reference annotations.
