Have a personal or library account? Click to login
A Real-Time Beat Tracking System with Zero Latency and Enhanced Controllability Cover

A Real-Time Beat Tracking System with Zero Latency and Enhanced Controllability

Open Access
|Oct 2024

Figures & Tables

tismir-7-1-189-g1.png
Figure 1

Illustration of the original offline PLP-based algorithm (left side), as described in Section 2, and the real-time procedure (right side), detailed in Section 3. (a) Audio signal. (b) Activation function. (c) Tempogram. (d) Pulse kernels. (e) PLP function. To provide clearer visualization and illustrate the general idea, we plot kernels only at 2-second intervals.

tismir-7-1-189-g2.png
Figure 2

The output of the real-time beat tracking system: (a) Beat Detection (Section 4.1). (b) Beat Lookahead (Section 4.3). (c) Beat Stability (Section 4.4). (d) Inter Beat Interval (Section 4.5).

Table 1

Overview of the datasets used for evaluation.

DatasetDataset (Total)Track (Average)
NameTracksLengthTypeDurationTempoStability
Ballroom698  6h 03mExcerpt  
GTZAN993  8h 16mExcerpt  
Rock20012h 53mFull
RWCPop100  6h 46mFull
tismir-7-1-189-g3.png
Figure 3

Beat-wise distribution of inter beat intervals (IBI) in various datasets, considering a tempo resolution bin size of 5 BPM. The tempo range for our online model (30–240 BPM) is indicated with dashed lines.

Table 2

Comparing various low-latency online beat trackers under specific conditions (C1, . . . , C5) and against existing literature for beat performance, latency, and tempo range, utilizing the GTZAN dataset. A tempo range of of the average track tempo is denoted by TR40 and ground truth activation by GT.

ModelModeCommentsF1-score (%)Latency (ms)Tempo (BPM)
RNN-PLP-OnOnlineour model74.7211.6130 - 240
RNN-PLP-On-ZeroOnlineour model (zero latency)74.680.0030 - 240
Exploratory Studies: Oracle Conditions
RNN-PLP-On-TR40Online(C1) use avg. track tempo75.1111.61track (mean)
GT-PLP-OnOnline(C2) use GT activation91.9311.6130 - 240
RNN-PLP-OffOffline(C3) use non-causal data79.0730 - 240
RNN-PLP-Off-TR40Offline(C4) use avg. track tempo82.00track (mean)
GT-PLP-OffOffline(C5) use GT activation97.8330 - 240
Methods Overview: Comparing with Literature
BEAST-1OnlineChang and Su (2024)80.0446.4455 - 215
Novel-1DOnlineHeydari et al. (2022)76.4820.0055 - 215
BeatNetOnlineHeydari et al. (2021)75.4420.0055 - 215
Böck-FFOnlineBöck et al. (2014)74.1846.4455 - 215
SpecTNT-TCNOfflineHung et al. (2022)88.7
TransformerOfflineZhao et al. (2022)88.5
TCNOfflineBöck and Davies (2020)88.5
tismir-7-1-189-g4.png
Figure 4

F1-score and L-correct metric for different activation functions and various post-processing methods on the GTZAN dataset. A tempo range of average track tempo is denoted by TR40.

tismir-7-1-189-g5.png
Figure 5

F1-score for different kernel sizes of PLP-On across various activation functions on different datasets.

tismir-7-1-189-g6.png
Figure 6

F1-score of various settings for lookahead of PLP-On for various activation functions across different datasets, see Table 3 for numbers.

Table 3

The F1-score of lookahead settings in frames (and milliseconds) of PLP-On for different activation functions across different datasets, with each F1-score accompanied by the difference (in parenthesis) to the zero lookahead.

SettingsF1-score (%) vs. Lookahead
Lookahead in frames (ms)0 (0.0)1 (11.6)10 (116.1)50 (580.5)100 (1161.0)200 (2322.0)
RNNGTZAN74.72
Ballroom84.39
RWCPop78.22
Rock79.74
GTGTZAN91.93
Ballroom94.44
RWCPop96.10
Rock95.39
tismir-7-1-189-g7.png
Figure 7

A block diagram of the beatcli.py terminal application. (A) Input arguments. (B) Audio input. (C) Audio analysis. (D) Terminal output. (E) Network output. (F) Receiving software. (G) Receiving hardware.

tismir-7-1-189-g8.png
Figure 8

The help function of the beatcli.py application with information about input arguments.

tismir-7-1-189-g9.png
Figure 9

The terminal output of the beatcli.py application showing the system in action.

tismir-7-1-189-g10.png
Figure 10

The educational music game “Rock Your Beats” (bottom) with the corresponding real-time PLP buffer (top), used to derive the positions of “beat creatures” in the game world.

DOI: https://doi.org/10.5334/tismir.189 | Journal eISSN: 2514-3298
Language: English
Submitted on: Mar 2, 2024
Accepted on: Aug 1, 2024
Published on: Oct 1, 2024
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2024 Peter Meier, Ching-Yu Chiu, Meinard Müller, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.