Have a personal or library account? Click to login
Music Tempo Estimation: Are We Done Yet? Cover

Figures & Tables

tismir-3-1-43-g1.png
Figure 1

IR research cycle (Urbano et al., 2013).

tismir-3-1-43-g2.png
Figure 2

ACC1 of several tempo estimation systems depending on tolerance measured on Ballroom with a ground truth based on beat annotations by Krebs et al. (2013).

tismir-3-1-43-g3.png
Figure 3

Empirical distributions of (a) OE1, (b) OE2, (c) AOE1, and (d) AOE2 using kernel density estimation (KDE). Based on values measured for Ballroom using a median ICBI-derived ground truth created from beat annotations by Krebs et al. (2013). Ordered by year of publication (Scheirer, 1998; Klapuri et al., 2006; Davies et al., 2009; Oliveira et al., 2010; Gkiokas et al., 2012; Percival and Tzanetakis, 2014; Schreiber and Müller, 2014; Böck et al., 2015; Schreiber and Müller, 2017, 2018b). Estimates for zplane and echonest stem from Percival and Tzanetakis (2014).

Table 1

Popular public tempo datasets.

DatasetRecordingsTempo Ann.Beat Ann.
ISMIR04 Songs (Gouyon et al., 2006)1464BPMNo
Ballroom (Gouyon et al., 2006; Krebs et al., 2013)1698BPMYes
RWC-C (Goto et al., 2002)250BPMYes
RWC-G (Goto et al., 2003)2100BPMYes
RWC-J (Goto et al., 2002)250BPMYes
RWC-P (Goto et al., 2002)2100BPMYes
RWC-R (Goto et al., 2002)215BPMYes
GTzan (Tzanetakis and Cook, 2002; Marchand and Peeters, 2015)1999BPMYes
Hainsworth (Hainsworth, 2004)1222BPMYes
ACM Mirum (Peeters and Flocon-Cholet, 2012)11,410BPMNo
SMC (Holzapfel et al., 2012)1217BPMYes
GiantSteps Tempo (Knees et al., 2015; Schreiber and Müller, 2018a)3664BPM/T1,T2,ST1No
Extended Ballroom (Marchand and Peeters, 2016)14,180BPMNo
LMD Tempo (Raffel, 2016; Schreiber and Müller, 2018b)43,611BPMNo

[i] 1 Excerpts available. 2 Requires application and purchase. 3 BeatPort previews, cached versions available from JKU. 4 7Digital previews available.

tismir-3-1-43-g4.png
Figure 4

Dependability index Φ^ as function of metric and track count. Vertical dotted line: actual number of tracks in dataset. Horizontal dotted line: Φ^ = 0.95. Desired quadrant shaded in pale orange. (a–g) Φ^ based on estimates Davies et al. (2009); Percival and Tzanetakis (2014); Böck et al. (2015); Schreiber and Müller (2017, 2018b). (h) Φ^ based on MIREX 2018 results.

tismir-3-1-43-g5.png
Figure 5

Histograms of BPM values for GTzan jazz.00053 based on (a) IBIs and (b) ICBIs.

tismir-3-1-43-g6.png
Figure 6

Dependencies between application, use case, metric, and dataset (an arrow from A to B denotes that A depends on B).

tismir-3-1-43-g7.png
Figure 7

Distributions of normalized tempi. The gray area marks the interval [0.96,1.04]. The shown percentage is the fraction of normalized tempi within the interval.

tismir-3-1-43-g8.png
Figure 8

Percentage of tracks with cvar(t) < τ.

tismir-3-1-43-g9.png
Figure 9

ACC2 for tracks with cvar(t) < τ. Lower τ coincides with higher accuracy. Datasets: (a) SMC (b) Hainsworth (c) GTzan (d) Ballroom. Different y-scales used for clarity.

tismir-3-1-43-g10.png
Figure 10

(a), (c) ACC1 and mean OE1 for T ± 10BPM intervals. (b) Smoothed tempo distribution of tracks in Ballroom according to the ground truth from Percival and Tzanetakis (2014). (d) OE1 predictions of generalized additive models (GAM). Shaded areas correspond to 95% confidence intervals.

tismir-3-1-43-g11.png
Figure 11

(a) Per genre OE1 distributions based on kernel density estimation (KDE) for tracks from Ballroom using the ground truth from Percival and Tzanetakis (2014). Mean OE1 values are marked in black. (b) Genre distribution in Ballroom.

DOI: https://doi.org/10.5334/tismir.43 | Journal eISSN: 2514-3298
Language: English
Submitted on: Oct 16, 2019
Accepted on: Jul 7, 2020
Published on: Aug 24, 2020
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2020 Hendrik Schreiber, Julián Urbano, Meinard Müller, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.