Have a personal or library account? Click to login
Cross-Modal Approaches to Beat Tracking: A Case Study on Chopin Mazurkas Cover

Cross-Modal Approaches to Beat Tracking: A Case Study on Chopin Mazurkas

Open Access
|May 2025

Figures & Tables

tismir-8-1-238-g1.png
Figure 1

Beat activity estimation of an audio representation using a frame‑based approach (left) and a symbolic representation using an event‑based approach (right).

tismir-8-1-238-g2.png
Figure 2

Overview of the system. F2E: frame‑to‑event conversion; E2F: event‑to‑frame conversion.

Table 1

The five Chopin Mazurkas and their identifiers used in our study. The last three columns indicate the number of beats, performances, and total duration (in hours) available for the respective piece. Dur.: duration; h: hours; ID: identifier; Perf.: performances; Op.: Opus.

IDPieceNumber (Beats)Number (Perf.)Dur. (h)
M17‑4Op. 17, No. 4396644.62
M24‑2Op. 24, No. 2360642.44
M30‑2Op. 30, No. 2193340.80
M63‑3Op. 63, No. 3229883.15
M68‑3Op. 68, No. 3181511.43
tismir-8-1-238-g3.png
Figure 3

Processing of beat activation functions. (a) Frame‑based activation function from an audio activity estimator. (b) Gaussian smoothing of (a). (c) Max normalization of (b). (d) Peak‑picking results of (c). (e) Event‑based activation function from a symbolic activity estimator. (f) Event‑to‑frame conversion of (e). (g) Gaussian smoothing of (f). (h) Max normalization of (g). Red vertical lines indicate reference annotated beats. Red regions highlight the 70 ms tolerance window.

Table 2

Work‑wise average beat‑tracking results for pretrained models. (Top) madmom‑based audio beat trackers. (Bottom) PM2S‑based symbolic beat trackers. The best results are highlighted in bold. GLB: global; LOC: local.

ThresholdF‑measureL‑correct
F1PRF‑L2F‑L3F‑L4
Audio beat trackers (ABTs)
GLB‑0.010.7570.0540.6320.0710.9680.0200.5490.1160.4760.1300.4090.140
GLB‑0.10.8250.0530.7300.0740.9630.0210.6980.1010.6390.1190.5650.140
GLB‑0.250.8860.0480.8770.0560.9010.0530.8310.0800.7950.0980.7460.125
GLB‑0.50.6600.1080.9550.0370.5180.1180.4870.1600.3720.1740.2840.161
Oracle0.8920.0450.8660.0610.9230.0440.8420.0710.8090.0890.7590.120
LOC‑50.8350.0470.7480.0660.9580.0240.7450.0800.6960.0980.6280.121
LOC‑100.8400.0480.7550.0670.9580.0240.7460.0820.6960.1000.6270.124
LOC‑200.8380.0480.7510.0680.9600.0240.7370.0840.6860.1020.6160.125
Symbolic beat trackers (SBTs)
GLB‑0.010.8230.0450.7410.0640.9370.0380.7000.0920.6060.1230.4970.148
GLB‑0.10.8360.0560.8770.0640.8040.0720.7230.1010.6480.1290.5680.149
GLB‑0.250.7750.0700.9130.0570.6790.0910.5890.1350.4880.1610.4000.175
GLB‑0.50.6620.0920.9350.0520.5220.1070.3710.1630.2580.1740.2010.168
Oracle0.8550.0480.8450.0720.8700.0520.7670.0840.6980.1140.6210.138
LOC‑50.8410.0560.9150.0550.7820.0690.7450.1000.6760.1310.6020.152
LOC‑100.8420.0560.9150.0550.7830.0680.7460.0990.6770.1290.6040.152
LOC‑200.8440.0550.9150.0550.7860.0670.7500.0970.6820.1280.6100.150
Table 3

Work‑wise average of beat‑tracking results (including late‑fusion approaches). Beat‑tracking results (including late‑fusion approaches) were derived using peak‑picking with local average threshold with a window length of 20 seconds (LOC‑20). The best results are highlighted in bold.

ActivationF‑measureL‑correct
F1PRF‑L2F‑L3F‑L4
Pretrained
0.8380.0480.7510.0680.9600.0240.7370.0840.6860.1020.6160.125
0.8440.0550.9150.0550.7860.0670.7500.0970.6820.1280.6100.150
0.8850.0440.8380.0630.9470.0270.8230.0710.7900.0860.7440.109
0.8500.0510.9590.0350.7650.0670.7490.0930.6840.1210.6100.142
Retrained
0.8160.0380.7160.0540.9650.0140.6960.0680.6060.0870.4680.100
0.9270.0370.9190.0420.9380.0370.9020.0530.8820.0660.8580.082
0.8600.0360.7860.0540.9620.0150.7720.0640.7120.0790.6250.107
0.9370.0340.9430.0310.9330.0410.9170.0470.8990.0590.8820.069
tismir-8-1-238-g4.png
Figure 4

Comparison of four types of activations. (top) Music score of Op. 30, No.2. (left) Activation functions from pretrained models. (right) Activation functions from retrained models. Red regions highlight the 70 ms tolerance window. Blue vertical lines indicate the beat estimations derived using peak‑picking with the LOC‑20 threshold setting. Op.: Opus.

tismir-8-1-238-g5.png
Figure 5

Effects of peak‑picking thresholds on beat‑tracking F1 scores. (a) Pretrained models. (b) Retrained models. Dashed lines indicate the F1 scores of the corresponding results derived using local average threshold (LOC‑20). Solid dots indicate the F1 scores derived using global threshold values .

tismir-8-1-238-g6.png
Figure 6

Beat‑tracking F1 scores of Maz‑5.

Table 4

Work‑wise average of downbeat‑tracking results (including late‑fusion approaches). Downbeat‑tracking results (including late‑fusion approaches) were derived using peak‑picking with local average threshold with a window length of 20 seconds (LOC‑20). The best results are highlighted in bold.

ActivationF‑measureL‑correct
F1PRF‑L2F‑L3F‑L4
Pretrained
0.4500.0390.3010.0300.8970.0610.0190.0130.0080.0050.0070.002
0.4350.0490.3090.0360.7440.0930.0270.0210.0130.0140.0100.010
0.4590.0400.3120.0290.8760.0690.0190.0120.0080.0060.0070.002
0.4480.0640.3470.0500.6400.1050.0840.0470.0320.0280.0180.019
Retrained
0.4010.0260.2540.0200.9610.0290.0100.0040.0050.0010.0050.001
0.6570.0710.5610.0720.8140.0800.4800.1000.4320.1020.3960.101
0.4340.0260.2810.0210.9710.0250.0160.0070.0080.0040.0070.003
0.6710.0730.5910.0720.7910.0820.5190.1030.4710.1090.4330.111
DOI: https://doi.org/10.5334/tismir.238 | Journal eISSN: 2514-3298
Language: English
Submitted on: Nov 12, 2024
Accepted on: Apr 1, 2025
Published on: May 2, 2025
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2025 Ching-Yu Chiu, Lele Liu, Christof Weiß, Meinard Müller, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.