Have a personal or library account? Click to login
Drumroll Please: Modeling Multi-Scale Rhythmic Gestures with Flexible Grids Cover

Drumroll Please: Modeling Multi-Scale Rhythmic Gestures with Flexible Grids

Open Access
|Nov 2021

Figures & Tables

tismir-4-1-98-g1.png
Figure 1

Fixed-Grid representation of a 1-measure pattern for 12 drums in a web interface designed by Yuri Suzuki (808303.studio) and inspired by Roland’s TR-808 Rhythm Composer.

tismir-4-1-98-g2.png
Figure 2

Fixed-Grid representation of a 1-measure pattern for a single drum in the interface for Propellerhead’s ReDrum drum machine.

tismir-4-1-98-g3.png
Figure 3

(a) One measure of drums from the Groove MIDI Dataset visualized in pianoroll format. In a grid at 16th-note resolution, 9 of the 15 snare drum hits in this measure would be mapped to duplicate slots in a matrix; of these, only 3 notes (colored in yellow) could be kept, and the other 6 (colored in red) would need to be discarded or quantized. (b) Mapping drum onset events to slots in our proposed Flexible Grid data representation. Red notes are considered secondary. Each instrument channel (kick, snare, hi-hat, etc.) receives one primary event per 16th note timestep, and space for secondary events is distributed with the minimum number of slots needed to fit the densest passages in the training set. Every event here has two continuous modification parameters for velocity and timing offsets.

Table 1

Statistics of the Groove MIDI Dataset used to build a Flexible Grid Representation at 16th note resolution.

DrumMax # of Onsets within 1/16 Note
Kick3
Snare7
Closed Hi-hat4
Open Hi-hat3
Low Tom3
Mid Tom3
Hi Tom3
Crash Cymbal2
Ride Cymbal2
Total30
Table 2

Statistics of the counts and percentages of events in the Groove MIDI Dataset training data that would be quantized or dropped by different data representations, before any modeling takes place. Variable length sequences in the Event-Based representation are between 4 and 300 tokens long.

Representation# Skipped% SkippedSize
Fixed-Grid (16)240386.94%32 × 9 × 3
Fixed-Grid (32)98752.85%64 × 9 × 3
Fixed-Grid (64)32100.92%128 × 9 × 3
Event-Based3480.10%X × 168
Flexible Grid (16)0032 × 30 × 3
tismir-4-1-98-g4.png
Figure 4

Results of a blind head-to-head listening survey. Eleven drummers each participated in 15 trials for this survey, each of them choosing between pairs of two-measure drum loops generated by VAE’s trained on each of three data representations.

tismir-4-1-98-g5.png
Figure 5

VAE Reconstruction (F1 scores per onset), plotted against sequences with increasingly more drumrolls and fast gestures. Data are aggregated such that the leftmost point on the line includes all drum sequences, the next point includes all drum sequences that have at least one event captured in the secondary matrix S, and so on.

Table 3

Accuracy Scores Classifying Drummer Identity with an MLP neural network, with 95% bootstrap confidence intervals. The Event-Based representation is excluded here because the variable-length representation does not enable modeling with a feed-forward classification model.

RepresentationDrummer IDGenre ID
Fixed-Grid (16)0.634 ±0.0270.547 ±0.026
Fixed-Grid (32)0.650 ±0.0260.544 ±0.026
Fixed-Grid (64)0.615 ±0.0260.519 ±0.026
Event-basedN/AN/A
Flexible Grid (16)0.683 ±0.0240.540 ±0.027
DOI: https://doi.org/10.5334/tismir.98 | Journal eISSN: 2514-3298
Language: English
Submitted on: Mar 1, 2021
Accepted on: Sep 8, 2021
Published on: Nov 17, 2021
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2021 Jon Gillick, Joshua Yang, Carmine-Emanuele Cella, David Bamman, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.