
Figure 1
Overview of the STAR Drums creation process. The non‑drum stem and the original drum stem are obtained from the original mix by using an MSS algorithm. An ADT algorithm creates an estimated annotation from the original drum stem, which is then used to render the re‑synthesized drum stem by using virtual drum instruments. The re‑synthesized drum stem is mixed with the non‑drum stem, resulting in the STAR mix, which forms the STAR Drums dataset together with the estimated annotation, which is now regarded as the reference annotation.
Table 1
Overview of available ADT datasets.
| Dataset | Non‑drum Instr. | Drums | Vocals | Melodic Instr. | # Drum Classes | Len. [h] |
|---|---|---|---|---|---|---|
| RWC Music Database (Goto et al., 2002) | Rec. | Rec. | Yes | Yes | 29 | 18.1 |
| ENST Drums (Gillet and Richard, 2006) | Rec. | Rec. & Synth. | No | Yes | 20 | 1.0 |
| MDB Drums (Southall et al., 2017) | Rec. | Rec. | Yes | Yes | 20 | 0.4 |
| RBMA13 (Vogl et al., 2017) | Rec. | Rec. | Yes | Yes | 23 | 1.9 |
| TMIDT (Vogl et al., 2018) | Synth. | Synth. | No | Yes | 18 | 257.1 |
| Slakh (Manilow et al., 2019) | Synth. | Synth. | No | Yes | — | 118.3 |
| A2MD (Wei et al., 2021) | Rec. | Rec. | Yes | Yes | 3 | 34.5 |
| ADTOF‑RGW (Zehren et al., 2021) | Rec. | Rec. | Yes | Yes | 5 | 89.2 |
| ADTOF‑YT (Zehren et al., 2023) | Rec. | Rec. | Yes | Yes | 5 | 202.2 |
| Proposed STAR Drums | Rec. | Synth. | Yes | Yes | 18 | 124.5 |
[i] In Slakh, no mapping from MIDI notes to drum classes is provided. Therefore, the number of supported classes depends on the mapping created by the user.

Figure 2
MIDI note velocity distribution of all drum classes and all tracks.
Table 2
Input data for STAR Drums.
| Dataset | Instrument stems provided | # Total Tracks | # Used Tracks | Len. Used Tracks [h] |
|---|---|---|---|---|
| MUSDB18 (Rafii et al., 2017) | Yes | 150 | 150 | 9.8 |
| ISMIR04 (Cano et al., 2006) | No | 2000 | 1228 | 98.4 |
| MTG‑Jamendo (Bogdanov et al., 2019) | No | 55000 | 4807 | 302.9 |
Table 3
Splits of STAR Drums.
| Split | Origin of data | MSS algorithm applied | Full tracks | Len. [h] |
|---|---|---|---|---|
| Training | ISMIR04 | Yes | No | 20.6 |
| Training | MTG‑Jamendo | Yes | No | 94.1 |
| Training (total) | ISMIR04 + MTG‑Jamendo | Yes | No | 114.7 |
| Validation | MUSDB18 | No | Yes | 8.3 (6.7) |
| Test | MUSDB18 | No | Yes | 1.6 (0.3) |
[i] Values in brackets indicate the duration of audio files that users must create by executing a mixing script. This is necessary because some track licenses of MUSDB18 do not permit the redistribution of remixed versions.
Table 4
Drum classes used with mapping to eight‑, five‑, and three‑class vocabulary, based on Vogl et al. (2018) and Zehren et al. (2023).
| Class name | # Classes | |||
|---|---|---|---|---|
| 18 | 8 | 5 | 3 | |
| Bass drum | BD | BD | BD | BD |
| Snare drum | SD | SD | SD | SD |
| Side stick | SS | |||
| Hand clap | CLP | |||
| Closed hi‑hat | CHH | HH | HH | HH |
| Pedal hi‑hat | PHH | |||
| Open hi‑hat | OHH | |||
| Tambourine | TB | |||
| Low tom | LT | TT | TT | |
| Mid tom | MT | |||
| High tom | HT | |||
| Splash cymbal | SPC | CY | CY | |
| Chinese cymbal | CHC | |||
| Crash cymbal | CRC | |||
| Ride cymbal | RD | RD | ||
| Ride bell | RB | BE | ||
| Cowbell | CB | |||
| Clave/sticks | CL | CL | ||

Figure 3
Genre distribution of the STAR Drums dataset.

Figure 4
Relative class frequencies of STAR Drums, MDB Drums, ENST Drums, and RBMA13.

Figure 5
Total number of detected drum sounds when transcribing ideal and non‑ideal non‑drum stems and ideal and non‑ideal drum stems of MUSDB18.
Table 5
Global F‑measure when training with TMIDT, Slakh, ADTOF, or STAR Drums and testing with MDB Drums, ENST Drums, RBMA13, or STAR Drums for models transcribing 3, 5, 8, and 18 classes.
| Model | Test Datasets | ||||
|---|---|---|---|---|---|
| CL | MDB Drums | ENST Drums | RBMA13 | STAR Drums Test | |
| 3 | TMIDT | 0.78 | 0.71 | 0.62 | 0.72 |
| Slakh | 0.76 | 0.77 | 0.55 | 0.73 | |
| STAR Drums | 0.81 | 0.78 | 0.67 | 0.85 | |
| ADTOF‑RGW | 0.80 | 0.80 | 0.67 | 0.77 | |
| ADTOF‑YT | 0.83 | 0.79 | 0.62 | 0.72 | |
| 5 | TMIDT | 0.65 | 0.69 | 0.55 | 0.61 |
| Slakh | 0.68 | 0.72 | 0.48 | 0.59 | |
| STAR Drums | 0.79 | 0.77 | 0.62 | 0.82 | |
| ADTOF‑RGW | 0.78 | 0.75 | 0.60 | 0.72 | |
| ADTOF‑YT | 0.79 | 0.76 | 0.59 | 0.66 | |
| 8 | TMIDT | 0.63 | 0.66 | 0.52 | 0.63 |
| Slakh | 0.66 | 0.71 | 0.47 | 0.61 | |
| STAR Drums | 0.75 | 0.74 | 0.61 | 0.80 | |
| 18 | TMIDT | 0.58 | 0.61 | 0.41 | 0.55 |
| Slakh | 0.59 | 0.63 | 0.39 | 0.58 | |
| STAR Drums | 0.67 | 0.66 | 0.50 | 0.78 | |

Figure 6
Global F‑measure and F‑measure per instrument on MDB Drums for five classes when training with TMIDT, Slakh, ADTOF‑RGW, ADTOF‑YT, STAR Drums, and the original mix of STAR Drums (see Section 4.2.4). Class abbreviations are explained in Table 4.

Figure 7
Global F‑measure and F‑measure per instrument on MDB Drums for 18‑class vocabulary when training with TMIDT, Slakh, STAR Drums, and the original mix of STAR Drums (see Section 4.2.4). The classes hand clap, cowbell, and clave/sticks are excluded as MUSDB18 does not contain annotations for these classes. Class abbreviations are explained in Table 4.
Table 6
Global F‑measure results for training with the original mix and estimated annotations (pseudo‑labels), STAR random mix that combines re‑synthesized drum stems and non‑drum stems from different tracks, and a combination of both methods when transcribing 5 and 18 classes.
| Model | Test Datasets | ||||
|---|---|---|---|---|---|
| CL | STAR DrumsTraining Data | MDB Drums | ENST Drums | RBMA13 | STAR Drums Test |
| 5 | STAR mix | 0.79 | 0.77 | 0.62 | 0.82 |
| Original mix | 0.78 | 0.80 | 0.60 | 0.73 | |
| STAR random mix | 0.78 | 0.77 | 0.61 | 0.81 | |
| STAR mix + original mix | 0.78 | 0.78 | 0.61 | 0.77 | |
| STAR mix + STAR random mix | 0.77 | 0.74 | 0.60 | 0.80 | |
| Original mix + STAR random mix | 0.78 | 0.78 | 0.60 | 0.77 | |
| 18 | STAR mix | 0.67 | 0.66 | 0.51 | 0.78 |
| Original mix | 0.66 | 0.72 | 0.53 | 0.65 | |
| STAR random mix | 0.65 | 0.67 | 0.51 | 0.77 | |
| STAR mix + original mix | 0.68 | 0.70 | 0.54 | 0.73 | |
| STAR mix + STAR random mix | 0.67 | 0.66 | 0.51 | 0.78 | |
| Original mix + STAR random mix | 0.67 | 0.69 | 0.53 | 0.73 | |
[i] For comparison, results using the STAR mix are also provided.
