References
- 1Barlow, R. J. (1989). Statistics: A guide to the use of statistical methods in the physical sciences, volume 29. John Wiley & Sons.
- 2Bishop, C. M. (1995). Neural networks for pattern recognition. Oxford University Press. DOI: 10.1201/9781420050646.ptb6
- 3Boser, B. E., Guyon, I., & Vapnik, V. (1992). A training algorithm for optimal margin classifiers. In ACM Conference on Computational Learning Theory, pages 144–152. DOI: 10.1145/130385.130401
- 4Chang, E. I., & Lippmann, R. P. (1995). Using voice transformations to create additional training talkers for word spotting. In Advances in Neural Information Processing Systems, pages 875–882.
- 5Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), 21–27. DOI: 10.1109/TIT.1967.1053964
- 6Cui, X., Goel, V., & Kingsbury, B. (2015). Data augmentation for deep neural network acoustic modeling. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(9), 1469–1477. DOI: 10.1109/TASLP.2015.2438544
- 7Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transaction on Acoustics, Speech and Signal Processing, 28(4), 357–366. DOI: 10.1109/TASSP.1980.1163420
- 8Defferrard, M., Benzi, K., Vandergheynst, P., & Bresson, X. (2017). FMA: A dataset for music analysis. In International Society for Music Information Retrieval Conference, pages 316–323.
https://github.com/mdeff/fma . - 9Duda, R. O., Hart, P. E., & Stork, D. G. (2001). Pattern classification. Wiley New York, 2nd edition.
- 10Feng, Y., Zhuang, Y., & Pan, Y. (2003). Music information retrieval by detecting mood via computational media aesthetics. In IEEE International Conference on Web Intelligence, pages 235–241.
- 11Flexer, A. (2007). A closer look on artist filters for musical genre classification. In International Conference on Music Information Retrieval.
- 12Fu, Z., Lu, G., Ting, K. M., & Zhang, D. (2011). A survey of audio-based music classification and annotation. IEEE Transactions on Multimedia, 13(2), 303–319. DOI: 10.1109/TMM.2010.2098858
- 13Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24(6), 417. DOI: 10.1037/h0071325
- 14Humphrey, E. J., & Bello, J. P. (2012). Rethinking automatic chord recognition with convolutional neural networks. In 11th International Conference on Machine Learning and Applications (ICMLA), volume 2, pages 357–362. DOI: 10.1109/ICMLA.2012.220
- 15Jaitly, N., & Hinton, G. E. (2013). Vocal tract length perturbation (VTLP) improves speech recognition. In ICML Workshop on Deep Learning for Audio, Speech and Language, volume 117.
- 16Kanda, N., Takeda, R., & Obuchi, Y. (2013). Elastic spectral distortion for low resource speech recognition with deep neural networks. In IEEE Workshop on Automatic Speech Recognition and Understanding, pages 309–314. DOI: 10.1109/ASRU.2013.6707748
- 17Kirchhoff, H., Dixon, S., & Klapuri, A. (2012). Multitemplate shift-variant non-negative matrix deconvolution for semi-automatic music transcription. In International Society for Music Information Retrieval Conference, pages 415–420. DOI: 10.1109/ICASSP.2012.6287833
- 18Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, pages 1097–1105.
- 19LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324. DOI: 10.1109/5.726791
- 20Lee, C.-H., Shih, J.-L., Yu, K.-M., & Lin, H.-S. (2009a). Automatic music genre classification based on modulation spectral analysis of spectral and cepstral features. IEEE Transactions on Multimedia, 11(4), 670–682. DOI: 10.1109/TMM.2009.2017635
- 21Lee, H., Pham, P., Largman, Y., & Ng, A. Y. (2009b). Unsupervised feature learning for audio classification using convolutional deep belief networks. In Advances in Neural Information Processing Systems, pages 1096–1104.
- 22Lee, K., & Slaney, M. (2008). Acoustic chord transcription and key extraction from audio using keydependent HMMs trained on synthesized audio. IEEE Transactions on Audio, Speech, and Language Processing, 16(2), 291–301. DOI: 10.1109/TASL.2007.914399
- 23Li, T. L. H., & Chan, A. B. (2011). Genre classification and the invariance of MFCC features to key and tempo. In International Conference on Multimedia Modeling, pages 317–327. DOI: 10.1007/978-3-642-17832-0_30
- 24Lidy, T., Rauber, A., Pertusa, A., & Quereda, J. (2007). Improving genre classification by combination of audio and symbolic descriptors using a transcription system. In International Conference on Music Information Retrieval, pages 61–66.
- 25Mandel, M. I., & Ellis, D. P. (2008). Multiple-instance learning for music information retrieval. In International Conference on Music Information Retrieval, pages 577–582.
- 26Marchand, U., & Peeters, G. (2014). The modulation scale spectrum and its application to rhythmcontent description. In International Conference on Digital Audio Effects, pages 167–172.
- 27Mauch, M., & Ewert, S. (2013). The Audio Degradation Toolbox and its application to robustness evaluation. In International Society for Music Information Retrieval Conference, pages 83–88.
- 28McFee, B., Humphrey, E. J., & Bello, J. P. (2015). A software framework for musical data augmentation. In International Society for Music Information Retrieval Conference, pages 248–254.
- 29Ness, S. R., Theocharis, A., Tzanetakis, G., & Martins, L. G. (2009). Improving automatic music tag annotation using stacked generalization of probabilistic SVM outputs. In Proceedings of the 17th ACM International Conference on Multimedia, pages 705–708. DOI: 10.1145/1631272.1631393
- 30Oppenheim, A. V., & Schafer, R. W. (2009). Discrete-Time Signal Processing. Prentice Hall, 3rd edition.
- 31Orfanidis, S. J. (2005). High-order digital parametric equalizer design. Journal of the Audio Engineering Society, 53(11), 1026–1046.
- 32Peeters, G. (2007). A generic system for audio indexing: Application to speech/music segmentation and music genre recognition. In International Conference on Digital Audio Effects, pages 205–212.
- 33Peeters, G., Giordano, B., Susini, P., Misdariis, N., & McAdams, S. (2011). The Timbre Toolbox: Extracting audio descriptors from musical signals. The Journal of the Acoustical Society of America, 130(5), 2902–2916. DOI: 10.1121/1.3642604
- 34Peeters, G., & Rodet, X. (2003). Hierarchical Gaussian tree with inertia ratio maximization for the classification of large musical instrument databases. In International Conference on Digital Audio Effects.
- 35Quatieri, T. F., & McAulay, R. J. (1992). Shape invariant time-scale and pitch modification of speech. IEEE Transactions on Signal Processing, 40(3), 497–510. DOI: 10.1109/78.120793
- 36Ragni, A., Knill, K. M., Rath, S. P., & Gales, M. J. (2014). Data augmentation for low resource languages. In 15th Annual Conference of the International Speech Communication Association, pages 810–814.
- 37Röbel, A. (2003). Transient detection and preservation in the phase vocoder. In International Computer Music Conference (ICMC), pages 247–250.
- 38Röbel, A., & Rodet, X. (2005). Efficient spectral envelope estimation and its application to pitch shifting and envelope preservation. In International Conference on Digital Audio Effects, pages 30–35.
- 39Schlüter, J. (2016). Learning to pinpoint singing voice from weakly labeled examples. In International Society for Music Information Retrieval Conference, pages 44–50.
- 40Schlüter, J., & Grill, T. (2015). Exploring data augmentation for improved singing voice detection with neural networks. In International Society for Music Information Retrieval Conference, pages 121–126.
- 41Seyerlehner, K., & Schedl, M. (2014).
MIREX 2014: Optimizing the fluctuation pattern extraction process . Technical report, Dept. of Computational Perception, Johannes Kepler University, Linz, Austria. - 42Seyerlehner, K., Widmer, G., & Pohle, T. (2010a). Fusing block-level features for music similarity estimation. In International Conference on Digital Audio Effects, pages 225–232.
- 43Seyerlehner, K., Widmer, G., Schedl, M., & Knees, P. (2010b). Automatic music tag classification based on block-level features. In 7th Sound and Music Computing Conference.
- 44Simard, P. Y., Steinkraus, D., & Platt, J. C. (2003). Best practices for convolutional neural networks applied to visual document analysis. In International Conference on Document Analysis and Recognition, volume 3, pages 958–962. DOI: 10.1109/ICDAR.2003.1227801
- 45Tzanetakis, G., & Cook, P. (2002). Musical genre classification of audio signals. IEEE Transactions on Speech and Audio Processing, 10(5), 293–302. DOI: 10.1109/TSA.2002.800560
- 46Yaeger, L. S., Lyon, R. F., & Webb, B. J. (1997). Effective training of a neural network character classifier for word recognition. In Advances in Neural Information Processing Systems, pages 807–816.
- 47Zölzer, U. (2011). DAFx: Digital Audio Effects. John Wiley & Sons. DOI: 10.1002/9781119991298
