Have a personal or library account? Click to login
An Analysis of the Effect of Data Augmentation Methods: Experiments for a Musical Genre Classification Task Cover

An Analysis of the Effect of Data Augmentation Methods: Experiments for a Musical Genre Classification Task

Open Access
|Dec 2019

Figures & Tables

tismir-2-1-26-g1.png
Figure 1

Flowchart of the musical genre classification method using data augmentation. The Global Features block, made of Descriptors and Summary, is computed for each element produced by the data augmentation block.

Table 1

Convention for the transformation strength.

Γ or γ = 0no transformation
Γ or γ = 0.5very light transformations
Γ or γ = 1medium transformations
Γ or γ = 1.5strong transformations
Γ or γ = 2exaggerated degradations
Table 2

Evaluation of segmentation Accuracy (%) for ISMIR-2004.

train\testno seg.80 s30 s15 s
no seg.81.084.384.981.2
80 s83.786.586.885.3
30 s85.686.589.386.1
15 s85.385.387.287.3%
Table 3

Evaluation of transformations. Accuracy mean (%) for ISMIR-2004, using a transformation strength Γ* = 1. The small numbers in parentheses are the standard deviations (in percentage points, pp) computed with 25 repetitions. Note that the 95% confidence interval of the accuracy mean is less than 0.37pp.

train\testoriginal+1 transf.+2 transf.+4 transf.+14 transf.
original81.0(0.0)78.7(0.6)77.6(0.7)77.2(0.8)76.3(0.5)
+1 transf.84.5(0.6)83.6(0.7)83.4(0.9)83.3(0.7)81.9(0.6)
+2 transf.84.5(0.8)84.7(0.7)84.8(0.8)84.4(0.7)84.6(0.6)
+4 transf.85.0(0.9)84.2(0.9)84.3(0.5)85.4(0.7)85.4(0.6)
+14 transf.84.4(0.4)84.9(0.6)85.1(0.6)85.5(0.7)85.8(0.4)
Table 4

Testing combinations of segmentation and transformation. Accuracy mean (%) for ISMIR-2004. The symbols S and T respectively mean that segmentation or transformation is used during training (rows) or testing (columns), and the symbols S and T denote that the respective method is not used. Note that for the experiments with sound transformations, the standard deviations of the accuracy are less than 0.94pp and the 95% confidence intervals of the accuracy mean are less than 0.37pp.

train\testS TS TS TS T
S T81.077.284.378.1
S T85.085.484.485.4
S T83.781.886.583.2
S T85.886.386.387.1
Table 5

Natural vs artificial data augmentation. Accuracy mean (%) for FMA (10-fold cross validation). The first column corresponds to small training sets with 1000 songs, and the second to larger training sets with 5000 songs. Note that the results in bold font use training sets with the same size.

Small (1000)Big (5000)
No augmentation45.854.9
Segmentation48.655.2
Transformation48.554.7
Table 6

Robustness to degradation, shown by mean prediction accuracy (%) for ISMIR-2004. Rows represent amount of data augmentation; columns represent transformation strength Γ*. The standard deviations are less than 0.88pp and the 95% confidence intervals of the mean are less than 0.35pp.

Γ* for testing 00.511.52
original86.573.274.271.768.9
+1 transf.86.385.284.582.781.1
+2 transf.86.486.085.683.982.1
+4 transf.86.386.886.084.783.0
tismir-2-1-26-g2.png
Figure 2

Individual and chained transformations. Each horizontal colored bar and black segment represents the mean accuracy and its standard deviation computed with 25 repetitions of each experiment. The vertical dashed line represents the accuracy without transformation, and the dotted line represents the mean accuracy for the transformation chain used in this paper.

tismir-2-1-26-g3.png
Figure 3

Testing of transformation overfitting. The rows represent the transformations used during training, and the columns represent the transformations of the test signals (ISMIR-2004).

Table 7

Classification accuracy (%), showing the effect of transformations for cross-dataset issues. The SVM parameters C and σ are fixed to 1.

Training setTesting set (only original)
ISMIR-2004ISMIR-20041517-Artists
original85.040.6
+2 transf.85.346.5
1517-ArtistsISMIR-20041517-Artists
original57.058.6
+2 transf.56.963.1
Table 8

Evaluation of transformations. Accuracy mean (%) for ISMIR-2004, Γ* = 1, using: Std- Desc+ModSpec+GMM. The small numbers given between parentheses are the standard deviations (pp) computed with 25 repetitions. Note that the 95% confidence interval of the accuracy mean is less than 0.49pp.

train\testoriginal+1 transf.+2 transf.+4 transf.+14 transf.
original83.0(0.6)79.8(0.9)79.1(0.9)78.9(1.2)78.5(1.1)
+1 transf.82.7(0.9)82.8(0.9)83.1(0.8)83.4(0.9)83.6(0.9)
+2 transf.83.3(1.0)83.3(0.8)83.6(0.8)84.3(1.1)84.5(0.8)
+4 transf.83.1(0.5)83.5(0.8)84.0(0.8)84.4(0.7)84.8(0.6)
+14 transf.83.7(0.9)84.1(0.7)84.7(0.7)85.0(0.5)85.3(0.6)
Table 9

Evaluation of segmentation. Accuracy (%) for ISMIR-2004, using: StdDesc+ModSpec+GMM. The small numbers given between parentheses are the standard deviations (pp) computed with 25 repetitions. Note that the 95% confidence interval of the accuracy mean is less than 0.4pp.

train\testno seg.80 s30 s15 s
no seg.83.9(0.8)83.8(1.0)83.6(0.8)82.0(1.0)
80 s85.2(0.6)86.4(0.4)86.3(0.5)85.8(0.4)
30 s85.3(0.3)85.9(0.4)86.9(0.1)86.9(0.1)
15 s84.8(0.3)85.6(0.2)85.8(0.4)85.9(0.2)
Table 10

Testing combinations of segmentation and transformation. Accuracy (%) for ISMIR-2004, using: StdDesc+ModSpec+GMM. cf. Table 4 for an explanation. Note that the 95% confidence interval of the accuracy mean is less than 0.43pp.

train\testS TS TS TS T
S T83.1(0.7)78.6(1.1)83.2(0.5)80.3(0.8)
S T83.1(0.7)84.4(0.6)83.5(0.7)85.1(0.7)
S T85.3(0.3)83.2(0.7)85.8(0.6)84.4(0.5)
S T83.8(0.6)84.3(0.6)84.3(0.7)85.0(0.6)
Table 11

Natural vs artificial data augmentation. Accuracy (%) for FMA using Std- Desc+ModSpec+GMM. cf. Table 5 for an explanation.

Small (1000)Big (5000)
No augmentation41.754.0
Segmentation48.254.0
Transformations48.053.4
Table 12

Robustness to degradation, shown by mean prediction accuracy (%) for ISMIR-2004, using StdDesc+ModSpec+GMM. Rows represent amount of data augmentation; columns represent transformation strength Γ*. The standard deviations are less than 1.05pp and the 95% confidence intervals of the mean are less than 0.41pp.

Γ* for testing 00.511.52
original85.879.978.175.872.2
+1 transf.84.984.283.782.580.6
+2 transf.84.484.383.883.181.4
+4 transf.84.284.284.483.581.6
Table 13

Classification accuracy (%), showing the effect of transformations for cross-dataset issues, using StdDesc+ModSpec+GMM.

Training setTesting set (only original)
ISMIR-2004ISMIR-20041517-Artists
original86.147.8
+2 transf.85.247.5
1517-ArtistsISMIR-20041517-Artists
original46.760.7
+2 transf.56.762.9
DOI: https://doi.org/10.5334/tismir.26 | Journal eISSN: 2514-3298
Language: English
Submitted on: Dec 21, 2018
Accepted on: Aug 8, 2019
Published on: Dec 18, 2019
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2019 Rémi Mignot, Geoffroy Peeters, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.