Have a personal or library account? Click to login
On End-to-End White-Box Adversarial Attacks in Music Information Retrieval Cover

On End-to-End White-Box Adversarial Attacks in Music Information Retrieval

Open Access
|Jul 2021

Figures & Tables

tismir-4-1-85-g1.png
Figure 1

CNN architecture of the instrument classifier. Input: channels@mel bands × windows.

Table 1

Notation of variables.

VariableMeaning
xOriginal signal
yGround-truth label
XTime-frequency representation of x
δAdversarial perturbation
x^=x+δAdversarial example
tTarget class/prediction
fSystem (e.g., instrument classifier)
LsysSystem-specific loss function (e.g., cross-entropy loss)
xGradient w.r.t. x
ηMultiplication factor for updates
epCurrent iteration
δepPerturbation during iteration ep
αWeight factor for adversarial objective
ɛClipping factor for updates
Table 2

Comparison of the adversarial attacks on our instrument classifier. Results are chosen based on largest SNR with at least 150 (lines 4 to 7) and 180 (lines 8 to 11) successfully found adversarial examples out of 200. Depicted are averages or the median over samples; for the PGD-Attack, C&W and Multi-Scale C&W additionally average and standard deviation* of results over five runs are stated. Line 3 contains a baseline with random white-noise instead of adversarial perturbations.

Samples RequiredData Origin# SamplesAccuracySNRIterations
Clean2000.835
White-noise2000.785 ± 0.000*42.71 ± 0.00*
min.150FGSM1530.250–7.741.0
PGD-Attack151.8 ± 0.7*0.171 ± 0.004*40.13 ± 0.05*15.8 ± 0.4*
C&W153.2 ± 2.6*0.201 ± 0.016*44.23 ± 0.37*51.4 ± 2.7*
C&Wmulti_scale163.6 ± 3.0*0.167 ± 0.012 *43.82 ± 0.09*71.6 ± 5.4*
min.180FGSM1790.130–24.831.0
PGD-Attack190.8 ± 1.2*0.026 ± 0.004*16.47 ± 0.10*2.0 ± 0.0*
C&W180.2 ± 2.3*0.094 ± 0.010*42.98 ± 0.18*66.1 ± 3.7*
C&Wmulti_scale196.4 ± 1.0*0.024 ± 0.004*39.49 ± 0.17*22.6 ± 1.0*
tismir-4-1-85-g2.png
Figure 2

Confusion matrices computed on validation data, showing correct predictions in the diagonal, confusions off-diagonal. For samples without adversarial counterpart, original audio is used. Columns are ground-truth labels and rows predictions; columns are normalised to sum to 1. Order of labels (left to right and top to bottom): Accordion, Acoustic guitar, Bass drum, Bass guitar, Electric guitar, Female singing, Glockenspiel, Gong, Harmonica, Hi-hat, Male singing, and Marimba/xylophone.

Table 3

Results of adversarial C&W attack on music recommendation system for varying hub-sizes. SNR and k-occurrence expressed by mean ± standard deviation over all adversarial examples, the number of which is indicated by the number in column 3.

Hub-size# Hubs (before)# Hubs (after)# Non-hubs (after)SNRk-occurrence
25644 (4.1%)6,381 (40.5%)8,725 (55.4%)39.12 ± 5.5048.50 ± 31.42
50203 (1.3%)4,313 (27.4%)11,234 (71.3%)38.82 ± 5.0285.34 ± 43.77
7583 (0.5%)3,080 (19.6%)12,587 (79.9%)38.83 ± 4.58119.55 ± 56.05
10032 (0.2%)2,357 (15.0%)13,361 (84.8%)38.69 ± 4.33153.05 ± 64.89
12514 (0.1%)2,244 (14.2%)13,492 (85.7%)38.46 ± 4.18183.03 ± 71.89
tismir-4-1-85-g3.png
Figure 3

Histogram of changes in k-occurrence before and after the C&W attack on the music recommendation system for a hub-size of 25. Changes larger than zero denote an increase of the k-occurrence after an attack.

DOI: https://doi.org/10.5334/tismir.85 | Journal eISSN: 2514-3298
Language: English
Submitted on: Jan 11, 2021
Accepted on: May 28, 2021
Published on: Jul 7, 2021
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2021 Katharina Prinz, Arthur Flexer, Gerhard Widmer, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.