
Figure 1
Illustration of the original offline PLP-based algorithm (left side), as described in Section 2, and the real-time procedure (right side), detailed in Section 3. (a) Audio signal. (b) Activation function. (c) Tempogram. (d) Pulse kernels. (e) PLP function. To provide clearer visualization and illustrate the general idea, we plot kernels only at 2-second intervals.

Figure 2
The output of the real-time beat tracking system: (a) Beat Detection (Section 4.1). (b) Beat Lookahead (Section 4.3). (c) Beat Stability (Section 4.4). (d) Inter Beat Interval (Section 4.5).
Table 1
Overview of the datasets used for evaluation.
| Dataset | Dataset (Total) | Track (Average) | ||||
|---|---|---|---|---|---|---|
| Name | Tracks | Length | Type | Duration | Tempo | Stability |
| Ballroom | 698 | 6h 03m | Excerpt | |||
| GTZAN | 993 | 8h 16m | Excerpt | |||
| Rock | 200 | 12h 53m | Full | |||
| RWCPop | 100 | 6h 46m | Full | |||

Figure 3
Beat-wise distribution of inter beat intervals (IBI) in various datasets, considering a tempo resolution bin size of 5 BPM. The tempo range for our online model (30–240 BPM) is indicated with dashed lines.
Table 2
Comparing various low-latency online beat trackers under specific conditions (C1, . . . , C5) and against existing literature for beat performance, latency, and tempo range, utilizing the GTZAN dataset. A tempo range of of the average track tempo is denoted by TR40 and ground truth activation by GT.
| Model | Mode | Comments | F1-score (%) | Latency (ms) | Tempo (BPM) |
|---|---|---|---|---|---|
| RNN-PLP-On | Online | our model | 74.72 | 11.61 | 30 - 240 |
| RNN-PLP-On-Zero | Online | our model (zero latency) | 74.68 | 0.00 | 30 - 240 |
| Exploratory Studies: Oracle Conditions | |||||
| RNN-PLP-On-TR40 | Online | (C1) use avg. track tempo | 75.11 | 11.61 | track (mean) |
| GT-PLP-On | Online | (C2) use GT activation | 91.93 | 11.61 | 30 - 240 |
| RNN-PLP-Off | Offline | (C3) use non-causal data | 79.07 | – | 30 - 240 |
| RNN-PLP-Off-TR40 | Offline | (C4) use avg. track tempo | 82.00 | – | track (mean) |
| GT-PLP-Off | Offline | (C5) use GT activation | 97.83 | – | 30 - 240 |
| Methods Overview: Comparing with Literature | |||||
| BEAST-1 | Online | Chang and Su (2024) | 80.04 | 46.44 | 55 - 215 |
| Novel-1D | Online | Heydari et al. (2022) | 76.48 | 20.00 | 55 - 215 |
| BeatNet | Online | Heydari et al. (2021) | 75.44 | 20.00 | 55 - 215 |
| Böck-FF | Online | Böck et al. (2014) | 74.18 | 46.44 | 55 - 215 |
| SpecTNT-TCN | Offline | Hung et al. (2022) | 88.7 | – | – |
| Transformer | Offline | Zhao et al. (2022) | 88.5 | – | – |
| TCN | Offline | Böck and Davies (2020) | 88.5 | – | – |

Figure 4
F1-score and L-correct metric for different activation functions and various post-processing methods on the GTZAN dataset. A tempo range of average track tempo is denoted by TR40.

Figure 5
F1-score for different kernel sizes of PLP-On across various activation functions on different datasets.

Figure 6
F1-score of various settings for lookahead of PLP-On for various activation functions across different datasets, see Table 3 for numbers.
Table 3
The F1-score of lookahead settings in frames (and milliseconds) of PLP-On for different activation functions across different datasets, with each F1-score accompanied by the difference (in parenthesis) to the zero lookahead.
| Settings | F1-score (%) vs. Lookahead | ||||||
|---|---|---|---|---|---|---|---|
| Lookahead in frames (ms) | 0 (0.0) | 1 (11.6) | 10 (116.1) | 50 (580.5) | 100 (1161.0) | 200 (2322.0) | |
| RNN | GTZAN | 74.72 | |||||
| Ballroom | 84.39 | ||||||
| RWCPop | 78.22 | ||||||
| Rock | 79.74 | ||||||
| GT | GTZAN | 91.93 | |||||
| Ballroom | 94.44 | ||||||
| RWCPop | 96.10 | ||||||
| Rock | 95.39 | ||||||

Figure 7
A block diagram of the beatcli.py terminal application. (A) Input arguments. (B) Audio input. (C) Audio analysis. (D) Terminal output. (E) Network output. (F) Receiving software. (G) Receiving hardware.

Figure 8
The help function of the beatcli.py application with information about input arguments.

Figure 9
The terminal output of the beatcli.py application showing the system in action.

Figure 10
The educational music game “Rock Your Beats” (bottom) with the corresponding real-time PLP buffer (top), used to derive the positions of “beat creatures” in the game world.
