Have a personal or library account? Click to login
Characterising Confounding Effects in Music Classification Experiments through Interventions Cover

Characterising Confounding Effects in Music Classification Experiments through Interventions

Open Access
|Aug 2019

Figures & Tables

tismir-2-1-24-g1.png
Figure 1

Pipeline of a single iteration k of a classification experiment evaluating a system construction method m (combination of feature extraction and learning algorithm) on a music collection D. Square-shaped nodes represent data structures; diamond-shape nodes represent processes. A double border indicates a treatment factor with fixed level. Solid lines indicate information flow; dashed lines join components of the same data structure. π is a data assignment/partitioning function. Dt is the training collection; Dp is the testing collection, with Rp the raw data (e.g., recordings) and Ap the corresponding annotations. (Rt and At omitted for simplicity.) s is the trained system, A^p the predicted annotations, ϕ the performance metric function, and y^ an estimate of the theoretical performance y – i.e., given the true distribution.

Algorithm 1

Regulated Bootstrap resampling strategy, given a collection D and a threshold nr ∈ ℕ.

RegulatedBootstrap(D, nr):
- Initialise: Dt ← (∅), Dp ← (∅)
- For each a ∈ A:
    0. Define Da as the instances in D with ai = a;
    1. Phase 1: Stratified Bootstrap Sampling
        (a) Create dt by uniformly sampling with replacement |Da| instances from Da;
        (b) Create dpDa\dt;
    2. Phase 2: Size Verification
        (a) Define Zt as the union of all zi in dt;
        (b) Create dp by selecting all instances (r, a, z)i in dp with zi not in Zt;
        (c) If |dp|  <  nr, proceed to Phase 3, as it lacks enough regulated instances; otherwise, go to Phase 4;
    3. Phase 3: Curated Sampling
        (a) Define Za as the union of all zi in Da;
        (b) Initialise a hold-out collection dh ← (∅);
        (c) Randomly select a z ∈ Za, and remove it from Za;
        (d) Define dz as the instances in Da with z ∈ zi;5
        (e) Append dz to dh: dhdh͡   dz;
        (f) If |dh| < nr, go to (3c), as dh still lacks enough instances;
        (g) Create dt by uniformly sampling with replacement |Da| instances from Da\dh;
        (h) Create dpDa\dt;
        (i) Go to Phase 2 to check size requirements;
    4. Phase 4: Concatenation
        (a) Append dt to Dt: DtDt͡   dt;
        (b) Append dp to Dp: DpDp͡   dp;
- Return: train/test pair (Dt, Dp)
tismir-2-1-24-g2.png
Figure 2

Artist distribution across classes in GTZAN, showing the number of unique artists (Top) and the quartiles of the number of excerpts per artist (Bottom) in each class. Dots indicate outliers.

tismir-2-1-24-g12.png
Table 1

Estimated proportion of train/test samples requiring curated sampling for each GTZAN class if drawn using Alg. 1 to regulate over artists, from 100,000 simulations with nr = 10.

tismir-2-1-24-g3.png
Figure 3

Distribution of the number of unique excerpts (Top) and artists (Bottom) per class in the training and testing collections sampled from GTZAN using bootstrap regulated over artists.

tismir-2-1-24-g4.png
Figure 4

Mean recall (± standard deviation) in train, test, and pr. test for each regulated bootstrap iteration over all combinations of feature extraction and learning algorithms on original GTZAN recordings. Position 0 represents the mean recall over all iterations.

tismir-2-1-24-g5.png
Figure 5

Quartiles of (mean) recall distribution obtained in train, test, and pr. test, marginalised over GTZAN class (Top), feature set (Middle), and learning algorithm (Bottom).

tismir-2-1-24-g6.png
Figure 6

Relationship between mean recall in test and pr. test obtained by systems constructed with different combinations of feature representations and learning algorithms on training collections sampled from GTZAN with bootstrap regulated over artists, represented both as individual values for each system (Left) and averages across iterations (Right). The dashed line indicates the case of equal mean recall in test and pr. test; the solid line indicates the linear regression model fitting the data as in Eq. (1).

tismir-2-1-24-g7.png
Figure 7

Quartiles of (mean) recall distribution obtained in train, train (filt.), test, and test (filt.), marginalised over GTZAN class (Top), feature set (Middle), and learning algorithm (Bottom). Note that the colours in this figure not matching those in Figs. 3, 4 and 5 correspond to different evaluation conditions.

tismir-2-1-24-g8.png
Figure 8

Relationship between mean recall in test and test (filt.) obtained by systems constructed with different combinations of feature representations and learning algorithms using training collections sampled from GTZAN with bootstrap regulated over artists, grouped by the source of feature set. Non-Scattering features are extracted with essentia. Instance-level scattering features correspond to Des. 1-L Sc.; the rest are frame-level. The dashed line indicates the case of equal mean recall in test and test (filt.).

tismir-2-1-24-g9.png
Figure 9

Quartiles of (mean) recall distribution obtained in test, test (filt.), pr. test, and pr. test (filt.), marginalised over GTZAN class (Top), feature set (Middle), and learning algorithm (Bottom). Note that the colours in this figure not matching those in Figs. 3, 4, 5 and 7 correspond to different evaluation conditions.

tismir-2-1-24-g10.png
Figure 10

Distribution of differences between the real variation ΔR and the accumulated variation ΔA in mean recall for artist and infrasonic regulation interventions in GTZAN, grouped by the source of feature set.

tismir-2-1-24-g11.png
Figure 11

Interaction between learning algorithm and evaluation condition in average mean recall for systems constructed using training collections sampled from GTZAN with bootstrap regulated over artists and 1&2-L Sc. feature representations.

DOI: https://doi.org/10.5334/tismir.24 | Journal eISSN: 2514-3298
Language: English
Submitted on: Oct 19, 2018
Accepted on: Jun 27, 2019
Published on: Aug 21, 2019
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2019 Francisco Rodríguez-Algarra, Bob L. Sturm, Simon Dixon, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.