Have a personal or library account? Click to login
BeatNet+: Real‑Time Rhythm Analysis for Diverse Music Audio Cover
By: Mojtaba Heydari and  Zhiyao Duan  
Open Access
|Dec 2024

Figures & Tables

tismir-7-1-198-g1.png
Figure 1

Neural structure of BeatNet+ for general music rhythm analysis. Both the main (left) and auxiliary (right) branches are initialized randomly and trained jointly, but only the main branch is utilized for inference.

tismir-7-1-198-g2.png
Figure 2

Neural structure of the auxiliary‑freezing (AF) adaptation approach for singing voice rhythm analysis. The main branch (left) is initialized randomly and trained for real‑time inference, while the auxiliary branch (right) is initialized with the pre‑trained BeatNet+ main branch weights and remains frozen during training.

tismir-7-1-198-g3.png
Figure 3

Illustration of the guided fine‑tuning (GF) approach for singing voice rhythm analysis. The model is initialized with the pre‑trained BeatNet+ main branch weights and fine‑tuned using music mixtures with backing music gradually removed over training epochs.

Table 1

Datasets used in our experiments.

DatasetNumber of piecesNumber of vocalsLabelsTrainValidationTest
Ballroom699452Original
GTZAN999741Original
Hainsworth220154Original
Rock Corpus200315Original
MUSDB18150263Added
URSing65106Added
RWC jazz500Revised
RWC pop100188Revised
RWC‑Royalty‑free1529Revised
Table 2

Results of online rhythm analysis evaluation for generic music and offline state‑of‑the‑art references, showcasing F1 scores in percentages with a tolerance window of ms, latency, and RTF for the GTZAN dataset.

Metrics (performance on full mixtures)
MethodBeat F1 Up_arrow.png (70 ms)Downbeat F1 Up_arrow.png (70 ms)Latency Down_arrow.png (ms)RTF Down_arrow.png
Online models
BeatNet+80.6256.51200.08
BeatNet+ (Solo)78.4349.74200.08
BeatNet (Heydari et al., 2021)75.4446.69200.06
Novel 1D (Heydari et al., 2022)76.4742.57200.02
IBT (Oliveira et al., 2010)68.99230.16
Böck FF (Böck et al., 2014)74.18460.05
BEAST (Chang and Su, 2024)80.0452.23460.40
Offline models
Transformers (Zhao et al., 2022)88.571.4
SpecTNT‑TCN (Hung et al., 2022)88.775.6
tismir-7-1-198-g4.png
Figure 4

F1 scores for beat tracking and downbeat tracking of the BeatNet+ model across diverse genres within the GTZAN dataset.

tismir-7-1-198-g5.png
Figure 5

F1 scores of online rhythm analysis models on singing voices (top row) and non‑percussion music (bottom row) with two tolerance windows, 70 ms and 200 ms.

DOI: https://doi.org/10.5334/tismir.198 | Journal eISSN: 2514-3298
Language: English
Submitted on: Apr 1, 2024
Accepted on: Sep 11, 2024
Published on: Dec 6, 2024
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2024 Mojtaba Heydari, Zhiyao Duan, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.