Have a personal or library account? Click to login
Multimodal Datasets for Studying Expert Performances of Musical Scores Cover

Multimodal Datasets for Studying Expert Performances of Musical Scores

Open Access
|Dec 2025

Figures & Tables

Table 1

Proposed taxonomy of modalities relevant to expressive interpretations of musical scores.

Data‑collection phaseTopicModalityExample(s)
Before the performanceScoreEngraved scoreWestern staff notation, printed
Symbolic scoreMusicXML, MIDI
Score‑derived dataMusic‑theoretic annotations/analyses
InstrumentInstrument characteristicsType, model, mechanical properties, temperament
VenueVenue visualsRoom lighting, capacity, aesthetics
Venue configurationLayout, performer location
Venue acousticsSize, shape, reverberance, acoustic anomalies
PerformerPerformer biographical dataAge, gender, studies, pedagogical background, expertise
Performer physical attributesAnthropometric measurements, state of physical fitness
Performer psychologyPersonality
Interpretative setExpressive intention or ideal
ListenerListener biographical dataPersonal, educational, professional/expertise
During the performanceInstrumentInstrument stateTuning, responsiveness, mechanical wear
VenueVenue visualsRoom lighting, stage effects or decorations
Venue configurationConfiguration of player(s) and instrument(s), presence and placement of intrusive recording equipment
Venue acousticsHumidity or temperature shifts
ListenerListener configurationPresence, location
Listener physiological stateHeart rate, skin conductance, respiration, pupil dilation, brain activity, eye tracking
PerformerPerformer movementsPerformer gestures, performer–performer interaction, performer–audience interaction (e.g. video, motion capture [MoCap])
Performer physiological stateHeart rate, skin conductance, respiration, pupil dilation, brain activity, eye tracking
Recording processRecording setupType, settings, and positioning of recording equipment
Recording post‑productionCuts, splices, equalisation, reverberation adjustment, file compression
Performance soundAudioMixed recording or master tracks
Ambient noiseExternal noise bleed, HVAC system sounds, audience noise
Audio‑derived dataAudio‑derived representations (e.g. spectrograms, loudness curves, audio–score alignment)
After the performancePerformerPerformer assessment of the interpretationReflection on the extent to which interpretational intent was carried out, post‑performance/ad hoc justification
Performer evaluation of the experience(Dis)comfort during performance, attention/distraction
ListenerListener evaluation of the performancePhysiological reactions, aesthetic preference, emotional response, attention, physical (dis)comfort during performance
Listener evaluation of the performerStage presence, movements, facial expressions
Table 2

Accessible multimodal music performance datasets assembled from extant recordings.

DatasetCitationModalities# players# pieces*# recordingsInstrument(s) or ensemblePerformance MIDI transcription
The Con Espressione Game Dataset2Cancino‑Chacón et al. (2020)Engraved score (PDF); symbolic score (MusicXML); performer biographical data (name); listener biographical data (expertise); recording setup (inferable from album metadata); audio‑derived data (loudness curves, spectrograms, MIDI, audio–score alignment annotations); listener evaluation of performance26945Piano‘Approximate’ from alignment/ loudness curves
ASAP3Foscarin et al. (2020)Symbolic score (MusicXML, MIDI); score‑derived data (music‑theoretic annotations); audio (some); ambient noise, audio‑derived data (MIDI, audio–score alignment annotations)Not listed2361,067PianoAutomatic + manual
PianoMotion10M4Gan et al. (2024)Performer movements (video, video annotations); audio; ambient noise (inferable from audio recording); audio‑derived data (MIDI)Not listedNot listed1,966PianoAutomatic + manual
CrestMusePEDB5†‡Hashida et al. (2008)Symbolic score (MusicXML, MIDI); performer biographical data (name); recording setup (inferable from album metadata); audio‑derived data (rough audio–score deviation data)Not listed~100Not listedPianoManual
Guqin dataset6Huang et al. (2020)Performer biographical data (name); recording setup (inferable from album metadata); audio; ambient noise (inferable from audio recording); audio‑derived data (note, tempo, and dynamic annotations; technique annotations)51039GuqinNone
GiantMIDI‑Piano7§Kong et al. (2022)Performer biographical data (name, dates, nationality); venue visuals and configuration (inferable from performance video); performer movements (video); audio; ambient noise (inferable from audio recording); audio‑derived data (MIDI)Not listed10,85510,855PianoAutomatic
MazurkaBL8†¶Kosta et al. (2018)Symbolic score (MusicXML); performer biographical data (name); recording setup (inferable from album metadata); audio‑derived data (loudness, expression, score‑aligned beat annotations)Not listed442,000PianoNone
GAPS9#Riley et al. (2024)Symbolic score (MusicXML); instrument characteristics (instrument‑specific features, tunings); venue visuals and configuration (inferable from performance video); performer biographical data (name, dates, nationality, gender); performer movements (video); audio; ambient noise (inferable from audio recording); audio‑derived data (MIDI, audio–score alignment annotations)205Not listed300GuitarNone
CHARM Mazurka Project10Sapp (2007)Recording setup (inferable from album metadata); audio; ambient noise (inferable from audio recording); audio‑derived data (some MIDI, some tempo and dynamics data)135492,926PianoManual
MusicNet11Thickstun et al. (2017)Performer biographical data (some names); recording setup (inferable from album metadata); audio; ambient noise (inferable from audio recording); audio‑derived data (note labels, MIDI)Not listedNot listed330VariedNone
SUPRA12Shi et al. (2019)Performer biographical data (name); audio; ambient noise (inferable from audio recording); audio‑derived data (‘raw’ MIDI hole file, ‘expressive’ MIDI dynamic hole file, piano roll image)151~430457PianoAutomatic + manual
Wagner Ring Dataset13Weiß et al. (2023)Engraved score (PDF); symbolic score (MusicXML, MIDI); score‑derived data (structural annotations, music‑theoretic annotations); performer biographical data (names); recording setup (inferable from album metadata); audio; ambient noise (inferable from audio recording); audio‑derived data (audio–score alignment)Not listedNot listed16Orchestra and vocalistsNone
Schubert Winterreise Dataset14† ††Weiß et al. (2021)Engraved score (PDF); symbolic score (MusicXML, MIDI); score‑derived data (structural annotations, music‑theoretic annotations); performer biographical data (names); instrument state (tuning); recording setup (inferable from album metadata); audio; ambient noise (inferable from audio recording); audio‑derived data (audio–score alignment annotations)1624216Piano, voiceNone
BPSD15Zeitler et al. (2024)Engraved score (PDF); symbolic score (Sibelius, MusicXML, MIDI); score‑derived data (music‑theoretic annotations); instrument characteristics (type); performer biographical data (names); recording setup (inferable from album metadata); audio (some); ambient noise (inferable from audio recording); audio‑derived data (MIDI, audio–score alignment)1132352PianoAutomatic + manual
ATEPP16Zhang et al. (2022)Symbolic score (43% MusicXML); performer biographical data (names); recording setup (inferable from album metadata); audio‑derived data (MIDI)491,59511,674PianoAutomatic

[i] * Individual movements of a larger work are counted as separate pieces.

[ii] † Dataset contains album names to allow any audio recordings not present in the dataset to be purchased commercially.

[iii] ‡ This first edition of CrestMusePEDB contains this number of commercially released recordings. The second edition adds performances recorded by the researchers and appears in Table 3.

[iv] § A subset of 7,236 recordings includes composers’ names in recording titles.

[v] ¶ MazurkaBL is an extension of the CHARM Mazurka Project.

[vi] # GAPS contains videos of 205 performers, but it is unclear if any audio‑only data may include additional players.

[vii] †† Number of performers includes different piano accompanists.

Table 3

Purpose‑recorded, publicly available multimodal datasets: Solo.

DatasetCitationModalities# players# pieces*# (N) or hours (H) of recording(s)Instrument(s)Performance MIDI transcription
ChoraleBricks17Balke et al. (2025)Symbolic score (MEI, MusicXML, MIDI, CSV); instrument characteristics (type); performer biographical data (birth year); recording setup (equipment type); audio; ambient noise (inferable from audio recording); audio‑derived data (audio–score alignment annotations)11102.7 HFlute, oboe, clarinet, trumpet, saxophone, baritone, trombone, tubaNone
Rach3 Dataset18Cancino‑Chacón and Pilov (2024)Symbolic score (MusicXML, MEI); instrument characteristics (type); venue visuals and configuration (inferable from performance video); performer movements (video); recording setup (equipment type, settings); audio; ambient noise (inferable from audio recording); audio‑derived data (MIDI); performer experience (log, description, Mood State Questionnaire)11~350 HPianoAutomatic
Bach Violin Dataset19Dong et al. (2022)Symbolic score (MusicXML); performer biographical data (name); recording post‑production (mixing, file conversion); audio; ambient noise (inferable from audio recording); audio‑derived data (estimated audio–score alignment annotations)17326.5 HViolinNone
Bach10 Dataset20Duan and Pardo (2012)Symbolic score (MIDI); audio; ambient noise (inferable from audio recording); audio‑derived data (audio–score alignment annotations)41040 NViolin, clarinet, saxophone, bassoonNone
The Vienna 4×22 Piano Corpus21Goebl (1999)Symbolic score; instrument characteristics (type); venue visuals and acoustics (described size, building name); performer biographical data (education, expertise); recording setup (equipment type, positioning); audio; ambient noise (inferable from audio recording); audio‑derived data (MIDI, audio–score alignment annotations)22488 NPianoAutomatic
RWC Music Database (Classical Music)22Goto et al. (2002)Performer biographical data (names); recording setup (inferable from album metadata); audio; ambient noise (inferable from audio recording); audio‑derived data (MIDI)194242 NPiano, violin, cello, flute, and othersManual
CrestMusePEDB (2nd edition)5Hashida et al. (2018)Symbolic score (MusicXML/MIDI); instrument characteristics (type); venue visuals (described locations); performer biographical data (names); interpretative set; recording setup (inferable from album metadata); audio; ambient noise (inferable from audio recording); audio‑derived data (MIDI, audio–score alignment annotations)1224443 NPianoAutomatic
MAESTRO23Hawthorne et al. (2019)Instrument characteristics (type); audio; ambient noise (inferable from audio recording); audio‑derived data (MIDI, audio–MIDI alignment)205~8641,276 NPianoAutomatic
PercePiano24§Park et al. (2024)Symbolic score (MusicXML, MIDI); instrument characteristics (type); listener biographical data (expertise); audio‑derived data (MIDI); listener evaluation of performance (annotations [Likert scale])251,202 excerpts1,202 NPianoAutomatic
The Batik‑plays‑Mozart corpus25Hu and Widmer (2023)Engraved score (PDF); score‑derived data (music‑theoretic annotations by Hentschel et al., 2021); symbolic score (MusicXML); instrument characteristics (type); performer biographical data (name); recording setup (inferable from album metadata); audio‑derived data (MIDI, MIDI–score alignment)13636 N
(3.75 H)
PianoAutomatic
SPD26Jin et al. (2024)Instrument characteristics (video); venue visuals and configuration (inferable from performance video); performer movements (video, 3D motion annotations); recording setup (equipment type and placement); audio; ambient noise (inferable from audio recording)9120> 3 HCello, violinNone
SMD MIDI‑Audio Piano Music Collection27Müller et al. (2011)Instrument characteristics (type); venue configuration (described location); performer biographical data (expertise); recording setup (equipment type and placement); recording post‑production; audio; ambient noise (inferable from audio recording); audio‑derived data (MIDI)Not listed3850 NPianoAutomatic
Piano Syllabus Dataset28Ramoneda et al. (2025)Instrument characteristics (type); venue visuals and configuration (inferable from performance video); performer movements (video); audio; ambient noise (inferable from audio recording); audio‑derived data (MIDI, CQT, piano rolls)Not listed7,9017,901 NPianoUnclear
Piano gestures dataset29Sarasúa et al. (2017)Symbolic score (MIDI); instrument characteristics (type, video); venue visuals and configuration (inferable from performance video); interpretative set (researchers asked them play different ways); performer movements (video, MoCap); recording setup (equipment type); audio; ambient noise (inferable from audio recording)21105 NPianoAutomatic
Violin gestures dataset29Sarasúa et al. (2017)Performer biographical data (expertise); interpretative set (researchers asked them to play different ways); performer movements (EMG, accelerometer, gyroscope); recording setup (equipment type); audio; ambient noise (inferable from audio recording)81880 NViolinNone
Telemann’s 12 Fantasias for Solo Flute30Thibaud et al. (2025)Engraved score (PDF); symbolic score (MEI, MSCZ); instrument characteristics (type); performer biographical data (name); audio; ambient noise (inferable from audio recording); audio‑derived data (audio–score alignment annotations)61272 NFluteNone
ARME Virtuoso Strings Dataset31Tomczak et al. (2023)Engraved score (PNG); interpretative set (researchers asked them to play in different ways); recording setup (equipment type and placement); audio; ambient noise (inferable from audio recording); audio‑derived data (note‑onset annotations)45746 NViola, violin, celloNone
CBFdataset32Wang et al. (2022)Venue visuals and acoustics (described room type); performer biographical data (expertise); performer movements (playing technique annotations); recording setup (equipment type); audio; ambient noise (inferable from audio recording)1042.6 HChinese bamboo flute (2 types)None
CCOM‑HuQin33Zhang et al. (2023)Engraved score (PDF); symbolic score (MusicXML); instrument characteristics (type); venue visuals (described type); performer biographical data (expertise, names); venue visuals (inferable from performance video); (venue configuration (inferable from performance video); performer movements (video); recording setup (equipment type and placement); recording post‑production; audio; ambient noise (inferable from audio recording); audio‑derived data (audio–score alignment annotations); performer assessment of interpretation (notes of applied techniques)8571.28 HErhu, Banhu, Gaohu, Zhuihu, ZhonghuNone

[i] *Individual movements of a larger work are counted as separate pieces. Regarding the Telemann Fantasias, each complete Fantasia is counted as one recording here since the dataset presents the whole Fantasia, without separating it into separate sections.

[ii] †RWC Music Database and ARME Virtuoso Strings Dataset also contain ensemble performances and so are shown in Tables 3 and 4.

[iii]Zhang and colleagues (2022) determined this number of performers by connecting names with the dataset’s recordings.

[iv] §PercePiano expanded upon data organised by MAESTRO with symbolic data (score and annotations).

[v] ¶Audio recordings may be purchased commercially.

Table 4

Purpose‑recorded, publicly available multimodal datasets: Ensemble.

DatasetCitationModalities# players# pieces*# (N) or hours (H) of recording(s)Instrument(s) and/or ensemble type(s)Performance MIDI transcription
Choral Singing Dataset34Cuesta et al. (2018)Performer biographical data (ensemble name); recording setup (equipment type and placement); audio; ambient noise; audio‑derived data (MIDI)16348 NVoiceAutomatic + manual
Quartet Body Motion and Pupillometry Dataset35Bishop and Jensenius (2020)Instrument characteristics (video); venue visuals (described location); performer biographical data (expertise); listener biographical data (expertise); venue visuals and configuration (inferable from video recording); performer movements (MoCap, video); performer physiological state (eye tracking); recording setup (equipment type and placement); audio; ambient noise (inferable from audio recording); performer experience (difficulty ratings)549 NString quartetNone
RWC Music Database
(Classical Music)22
Goto et al. (2002)Performer biographical data (ensemble name); audio; ambient noise (inferable from audio recording); audio‑derived data (MIDI)~962020 NOrchestra,
chamber ensembles
Manual
URMP36Li et al. (2018)Engraved score (PDF); symbolic score (MIDI); instrument characteristics (video); venue visuals (described); performer biographical data (expertise); venue visuals and configuration (inferable from performance video); performer movements (video); recording setup (equipment type and placement); recording post‑production; audio; ambient noise (inferable from audio recording); audio‑derived data (audio annotations)234444 NString duo, trio,
quartet, quintet
None
EEP37Marchini et al. (2014)Engraved score (PDF); performer biographical data (expertise); performer movements (MoCap, bowing annotations); recording setup (equipment type); recording post‑production; audio; ambient noise (inferable from audio recording); audio‑derived data (audio–score alignment annotations)4523 NString quartetNone
QUARTET38Papiotis (2015)Instrument characteristics (video); venue visuals (described); venue visuals and configuration (inferable from performance video); performer movements (MoCap, video); recording setup (equipment type and placement); audio; ambient noise (inferable from audio recording); audio‑derived data (audio–score alignment annotations)4Not listed96 NString quartetNone
Erkomaishvili Dataset39Rosenzweig et al. (2020)Symbolic score (MusicXML); performer biographical data (name); recording setup (equipment type); audio; ambient noise (inferable from audio recording); audio‑derived data (performed note‑onset annotations, fundamental frequency)1101101 N
(7 H)
VoiceNone
PHENICX‑conduct dataset40Sarasúa (2017)Symbolic score (MusicXML/MIDI); instrument characteristics (types); venue visuals (described location); performer biographical data (ensemble name); venue visuals and configuration (inferable from performance video); performer movements (video); recording setup; audio; ambient noise (inferable from audio recording)Not listed3 excerpts75 NOrchestraNone
ARME Virtuoso Strings Dataset31Tomczak et al. (2023)Engraved score (PNG); interpretative set (researchers asked them
to play in different ways); recording setup (equipment type and placement); audio; ambient noise (inferable from audio recording); audio‑derived data (note‑onset annotations)
45746 NString duo, trio, quartetNone

[i] *Individual movements of a larger work are counted as separate pieces.

[ii] †RWC Music Database and ARME Virtuoso Strings Dataset also contain ensemble performances so are in Tables 3 and 4.

DOI: https://doi.org/10.5334/tismir.230 | Journal eISSN: 2514-3298
Language: English
Submitted on: Oct 1, 2025
|
Accepted on: Nov 14, 2025
|
Published on: Dec 23, 2025
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2025 Katelyn Emerson, Peter M. C. Harrison, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.