Characterising Confounding Effects in Music Classification Experiments through Interventions

Francisco Rodríguez-Algarra; Bob L. Sturm; Simon Dixon

doi:10.5334/tismir.24

Figures & Tables

Pipeline of a single iteration k of a classification experiment evaluating a system construction method m (combination of feature extraction and learning algorithm) on a music collection D. Square-shaped nodes represent data structures; diamond-shape nodes represent processes. A double border indicates a treatment factor with fixed level. Solid lines indicate information flow; dashed lines join components of the same data structure. π is a data assignment/partitioning function. *D_t* is the training collection; *D_p* is the testing collection, with *R_p* the raw data (e.g., recordings) and *A_p* the corresponding annotations. (*R_t* and *A_t* omitted for simplicity.) s is the trained system, ${\hat{A}}_{p}$ the predicted annotations, ϕ the performance metric function, and $\hat{y}$ an estimate of the theoretical performance y – i.e., given the true distribution.

Algorithm 1

Regulated Bootstrap resampling strategy, given a collection D and a threshold n_r ∈ ℕ.

RegulatedBootstrap(D, n_r):
- Initialise: D_t ← (∅), D_p ← (∅)
- For each a ∈ A:
    0. Define D_a as the instances in D with a_i = a;
    1. Phase 1: Stratified Bootstrap Sampling
        (a) Create d_t by uniformly sampling with replacement |D_a| instances from D_a;
        (b) Create d_p ← D_a\d_t;
    2. Phase 2: Size Verification
        (a) Define Z_t as the union of all z_i in d_t;
        (b) Create

d_{p}^{'}

by selecting all instances (r, a, z)_i in d_p with z_i not in Z_t;
(c) If

| d_{p}^{'} |  < n_{r}

, proceed to Phase 3, as it lacks enough regulated instances; otherwise, go to Phase 4;
    3. Phase 3: Curated Sampling
        (a) Define Z_a as the union of all z_i in D_a;
        (b) Initialise a hold-out collection d_h ← (∅);
        (c) Randomly select a z ∈ Z_a, and remove it from Z_a;
        (d) Define d_z as the instances in D_a with z ∈ z_i;⁵
        (e) Append d_z to d_h: d_h ← d_h͡   d_z;
        (f) If |d_h| < n_r, go to (3c), as d_h still lacks enough instances;
        (g) Create d_t by uniformly sampling with replacement |D_a| instances from D_a\d_h;
        (h) Create d_p ← D_a\d_t;
        (i) Go to Phase 2 to check size requirements;
    4. Phase 4: Concatenation
        (a) Append d_t to D_t: D_t ← D_t͡   d_t;
        (b) Append d_p to D_p: D_p ← D_p͡   d_p;
- Return: train/test pair (D_t, D_p)