Have a personal or library account? Click to login
An Analysis of the Effect of Data Augmentation Methods: Experiments for a Musical Genre Classification Task Cover

An Analysis of the Effect of Data Augmentation Methods: Experiments for a Musical Genre Classification Task

Open Access
|Dec 2019

References

  1. 1Barlow, R. J. (1989). Statistics: A guide to the use of statistical methods in the physical sciences, volume 29. John Wiley & Sons.
  2. 2Bishop, C. M. (1995). Neural networks for pattern recognition. Oxford University Press. DOI: 10.1201/9781420050646.ptb6
  3. 3Boser, B. E., Guyon, I., & Vapnik, V. (1992). A training algorithm for optimal margin classifiers. In ACM Conference on Computational Learning Theory, pages 144152. DOI: 10.1145/130385.130401
  4. 4Chang, E. I., & Lippmann, R. P. (1995). Using voice transformations to create additional training talkers for word spotting. In Advances in Neural Information Processing Systems, pages 875882.
  5. 5Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), 2127. DOI: 10.1109/TIT.1967.1053964
  6. 6Cui, X., Goel, V., & Kingsbury, B. (2015). Data augmentation for deep neural network acoustic modeling. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(9), 14691477. DOI: 10.1109/TASLP.2015.2438544
  7. 7Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transaction on Acoustics, Speech and Signal Processing, 28(4), 357366. DOI: 10.1109/TASSP.1980.1163420
  8. 8Defferrard, M., Benzi, K., Vandergheynst, P., & Bresson, X. (2017). FMA: A dataset for music analysis. In International Society for Music Information Retrieval Conference, pages 316323. https://github.com/mdeff/fma.
  9. 9Duda, R. O., Hart, P. E., & Stork, D. G. (2001). Pattern classification. Wiley New York, 2nd edition.
  10. 10Feng, Y., Zhuang, Y., & Pan, Y. (2003). Music information retrieval by detecting mood via computational media aesthetics. In IEEE International Conference on Web Intelligence, pages 235241.
  11. 11Flexer, A. (2007). A closer look on artist filters for musical genre classification. In International Conference on Music Information Retrieval.
  12. 12Fu, Z., Lu, G., Ting, K. M., & Zhang, D. (2011). A survey of audio-based music classification and annotation. IEEE Transactions on Multimedia, 13(2), 303319. DOI: 10.1109/TMM.2010.2098858
  13. 13Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24(6), 417. DOI: 10.1037/h0071325
  14. 14Humphrey, E. J., & Bello, J. P. (2012). Rethinking automatic chord recognition with convolutional neural networks. In 11th International Conference on Machine Learning and Applications (ICMLA), volume 2, pages 357362. DOI: 10.1109/ICMLA.2012.220
  15. 15Jaitly, N., & Hinton, G. E. (2013). Vocal tract length perturbation (VTLP) improves speech recognition. In ICML Workshop on Deep Learning for Audio, Speech and Language, volume 117.
  16. 16Kanda, N., Takeda, R., & Obuchi, Y. (2013). Elastic spectral distortion for low resource speech recognition with deep neural networks. In IEEE Workshop on Automatic Speech Recognition and Understanding, pages 309314. DOI: 10.1109/ASRU.2013.6707748
  17. 17Kirchhoff, H., Dixon, S., & Klapuri, A. (2012). Multitemplate shift-variant non-negative matrix deconvolution for semi-automatic music transcription. In International Society for Music Information Retrieval Conference, pages 415420. DOI: 10.1109/ICASSP.2012.6287833
  18. 18Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, pages 10971105.
  19. 19LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 22782324. DOI: 10.1109/5.726791
  20. 20Lee, C.-H., Shih, J.-L., Yu, K.-M., & Lin, H.-S. (2009a). Automatic music genre classification based on modulation spectral analysis of spectral and cepstral features. IEEE Transactions on Multimedia, 11(4), 670682. DOI: 10.1109/TMM.2009.2017635
  21. 21Lee, H., Pham, P., Largman, Y., & Ng, A. Y. (2009b). Unsupervised feature learning for audio classification using convolutional deep belief networks. In Advances in Neural Information Processing Systems, pages 10961104.
  22. 22Lee, K., & Slaney, M. (2008). Acoustic chord transcription and key extraction from audio using keydependent HMMs trained on synthesized audio. IEEE Transactions on Audio, Speech, and Language Processing, 16(2), 291301. DOI: 10.1109/TASL.2007.914399
  23. 23Li, T. L. H., & Chan, A. B. (2011). Genre classification and the invariance of MFCC features to key and tempo. In International Conference on Multimedia Modeling, pages 317327. DOI: 10.1007/978-3-642-17832-0_30
  24. 24Lidy, T., Rauber, A., Pertusa, A., & Quereda, J. (2007). Improving genre classification by combination of audio and symbolic descriptors using a transcription system. In International Conference on Music Information Retrieval, pages 6166.
  25. 25Mandel, M. I., & Ellis, D. P. (2008). Multiple-instance learning for music information retrieval. In International Conference on Music Information Retrieval, pages 577582.
  26. 26Marchand, U., & Peeters, G. (2014). The modulation scale spectrum and its application to rhythmcontent description. In International Conference on Digital Audio Effects, pages 167172.
  27. 27Mauch, M., & Ewert, S. (2013). The Audio Degradation Toolbox and its application to robustness evaluation. In International Society for Music Information Retrieval Conference, pages 8388.
  28. 28McFee, B., Humphrey, E. J., & Bello, J. P. (2015). A software framework for musical data augmentation. In International Society for Music Information Retrieval Conference, pages 248254.
  29. 29Ness, S. R., Theocharis, A., Tzanetakis, G., & Martins, L. G. (2009). Improving automatic music tag annotation using stacked generalization of probabilistic SVM outputs. In Proceedings of the 17th ACM International Conference on Multimedia, pages 705708. DOI: 10.1145/1631272.1631393
  30. 30Oppenheim, A. V., & Schafer, R. W. (2009). Discrete-Time Signal Processing. Prentice Hall, 3rd edition.
  31. 31Orfanidis, S. J. (2005). High-order digital parametric equalizer design. Journal of the Audio Engineering Society, 53(11), 10261046.
  32. 32Peeters, G. (2007). A generic system for audio indexing: Application to speech/music segmentation and music genre recognition. In International Conference on Digital Audio Effects, pages 205212.
  33. 33Peeters, G., Giordano, B., Susini, P., Misdariis, N., & McAdams, S. (2011). The Timbre Toolbox: Extracting audio descriptors from musical signals. The Journal of the Acoustical Society of America, 130(5), 29022916. DOI: 10.1121/1.3642604
  34. 34Peeters, G., & Rodet, X. (2003). Hierarchical Gaussian tree with inertia ratio maximization for the classification of large musical instrument databases. In International Conference on Digital Audio Effects.
  35. 35Quatieri, T. F., & McAulay, R. J. (1992). Shape invariant time-scale and pitch modification of speech. IEEE Transactions on Signal Processing, 40(3), 497510. DOI: 10.1109/78.120793
  36. 36Ragni, A., Knill, K. M., Rath, S. P., & Gales, M. J. (2014). Data augmentation for low resource languages. In 15th Annual Conference of the International Speech Communication Association, pages 810814.
  37. 37Röbel, A. (2003). Transient detection and preservation in the phase vocoder. In International Computer Music Conference (ICMC), pages 247250.
  38. 38Röbel, A., & Rodet, X. (2005). Efficient spectral envelope estimation and its application to pitch shifting and envelope preservation. In International Conference on Digital Audio Effects, pages 3035.
  39. 39Schlüter, J. (2016). Learning to pinpoint singing voice from weakly labeled examples. In International Society for Music Information Retrieval Conference, pages 4450.
  40. 40Schlüter, J., & Grill, T. (2015). Exploring data augmentation for improved singing voice detection with neural networks. In International Society for Music Information Retrieval Conference, pages 121126.
  41. 41Seyerlehner, K., & Schedl, M. (2014). MIREX 2014: Optimizing the fluctuation pattern extraction process. Technical report, Dept. of Computational Perception, Johannes Kepler University, Linz, Austria.
  42. 42Seyerlehner, K., Widmer, G., & Pohle, T. (2010a). Fusing block-level features for music similarity estimation. In International Conference on Digital Audio Effects, pages 225232.
  43. 43Seyerlehner, K., Widmer, G., Schedl, M., & Knees, P. (2010b). Automatic music tag classification based on block-level features. In 7th Sound and Music Computing Conference.
  44. 44Simard, P. Y., Steinkraus, D., & Platt, J. C. (2003). Best practices for convolutional neural networks applied to visual document analysis. In International Conference on Document Analysis and Recognition, volume 3, pages 958962. DOI: 10.1109/ICDAR.2003.1227801
  45. 45Tzanetakis, G., & Cook, P. (2002). Musical genre classification of audio signals. IEEE Transactions on Speech and Audio Processing, 10(5), 293302. DOI: 10.1109/TSA.2002.800560
  46. 46Yaeger, L. S., Lyon, R. F., & Webb, B. J. (1997). Effective training of a neural network character classifier for word recognition. In Advances in Neural Information Processing Systems, pages 807816.
  47. 47Zölzer, U. (2011). DAFx: Digital Audio Effects. John Wiley & Sons. DOI: 10.1002/9781119991298
DOI: https://doi.org/10.5334/tismir.26 | Journal eISSN: 2514-3298
Language: English
Submitted on: Dec 21, 2018
Accepted on: Aug 8, 2019
Published on: Dec 18, 2019
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2019 Rémi Mignot, Geoffroy Peeters, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.