References
- 1Alpaydin, E. (2014). Introduction to Machine Learning. The MIT Press, Cambridge, MA, USA, 3rd edition.
- 2Andén, J., & Mallat, S. (2014). Deep Scattering Spectrum. IEEE Transactions on Signal Processing, 62(16), 4114–4128. DOI: 10.1109/TSP.2014.2326991
- 3Bogdanov, D., Porter, A., Herrera, P., & Serra, X. (2016). Cross-Collection Evaluation for Music Classification Tasks. In Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR’16), pages 379–385. New York City, NY, USA.
- 4Bogdanov, D., Wack, N., Gómez, E., Gulati, S., Herrera, P., Mayor, O., Roma, G., Salamon, J., Zapata, J., & Serra, X. (2013). Essentia: An Audio Analysis Library for Music Information Retrieval. In Proceedings of the 14th International Society for Music Information Retrieval Conference (ISMIR’13), Curitiba, Brazil.
- 5Carterette, B. A. (2012). Multiple Testing in Statistical Analysis of Systems-based Information Retrieval Experiments. ACM Transactions on Information Systems, 30(1), 4:1–4:34. DOI: 10.1145/2094072.2094076
- 6Charalambous, C. C., & Bharath, A. A. (2016). A Data Augmentation Methodology for Training Machine/Deep Learning Gait Recognition Algorithms. In British Machine Vision Conference. DOI: 10.5244/C.30.110
- 7Chen, J. H., & Asch, S. M. (2017). Machine Learning and Prediction in Medicine – Beyond the Peak of Inflated Expectations. New England Journal of Medicine, 376(26), 2507–2509. DOI: 10.1056/NEJMp1702071
- 8Choi, K., Fazekas, G., Sandler, M., & Cho, K. (2017). Transfer Learning for Music Classification and Regression Tasks. In Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR’17), Suzhou, China.
- 9Cobb, G. W. (1998). Design and Analysis of Experiments. Springer-Verlag.
- 10Davis, S. B., & Mermelstein, P. (1980). Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences. IEEE Transactions on Audio, Speech, and Language Processing, 28(4), 357–366. DOI: 10.1109/TASSP.1980.1163420
- 11Dixon, S., Gouyon, F., & Widmer, G. (2004). Towards Characterisation of Music via Rhythmic Patterns. In Proceedings of the 5th International Conference on Music Information Retrieval (ISMIR’04), pages 509–517. Barcelona, Spain.
- 12Drummond, C. (2006). Machine Learning as an Experimental Science (Revisited). In Procedings of the AAAI’06 Workshop on Evaluation for Machine Learning, Boston, MA, USA.
- 13Efron, B. (1977). Bootstrap Methods: Another Look at the Jackknife. The Annals of Statistics, 7(1), 1–26. DOI: 10.1214/aos/1176344552
- 14Efron, B. (1983). Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation. Journal of the American Statistical Association, 78(382), 316–331. DOI: 10.1080/01621459.1983.10477973
- 15Efron, B., & Tibshirani, R. (1997). Improvements on Cross-Validation: The 632+ Bootstrap Method. Journal of the American Statistical Association, 92(438), 548–560. DOI: 10.1080/01621459.1997.10474007
- 16Eugster, M. J. A. (2011). Benchmark Experiments. A Tool for Analyzing Statistical Learning Algorithms. PhD thesis, Ludwig-Maximilians-Universität München, München, Germany.
- 17Flach, P. (2012). Machine Learning. Cambridge University Press. DOI: 10.1017/CBO9780511973000
- 18Flexer, A. (2007). A Closer Look on Artist Filters for Musical Genre Classification. In Proceedings of the 8th International Conference on Music Information Retrieval (ISMIR’07), Vienna, Austria.
- 19Flexer, A., & Schnitzer, D. (2010). Effects of Album and Artist Filters in Audio Similarity Computed for Very Large Music Databases. Computer Music Journal, 34(3), 20–28. DOI: 10.1162/COMJ_a_00004
- 20Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning. Springer, 2nd edition. DOI: 10.1007/978-0-387-84858-7
- 21Hernández-Orallo, J. (2016). Evaluation in Artificial Intelligence: From Task-Oriented to Ability- Oriented Measurement. Artificial Intelligence Review, 48(3), 397–447. DOI: 10.1007/s10462-016-9505-7
- 22Hothorn, T., Leisch, F., Zeileis, A., & Hornik, K. (2005). The Design and Analysis of Benchmark Experiments. Journal of Computational and Graphical Statistics, 14(3), 675–699. DOI: 10.1198/106186005X59630
- 23Kaufman, S., Rosset, S., & Perlich, C. (2011). Leakage in Data Mining: Formulation, Detection, and Avoidance. In Proceedings of the 17th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD’11), pages 556–563. San Diego, CA, USA. DOI: 10.1145/2020408.2020496
- 24Kendall, M. G. (1938). A New Measure of Rank Correlation. Biometrica, 30(1–2), 81–89. DOI: 10.2307/2332226
- 25Langley, P. (1988). Machine Learning as an Experimental Science. Machine Learning, 3(1), 5–8. DOI: 10.1007/BF00115008
- 26Marques, G., Domingues, M. A., Langlois, T., & Gouyon, F. (2011). Three Current Issues in Music Autotagging. In Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR’11), pages 795–800. Miami, FL, USA.
- 27Mendelson, A. F., Zuluaga, M. A., Lorenzi, M., Hutton, B. F., & Ourselin, S. (2017). Selection Bias in the Reported Performances of AD Classification Pipelines. NeuroImage: Clinical, 14, 400–416. DOI: 10.1016/j.nicl.2016.12.018
- 28Mishra, S., Sturm, B. L., & Dixon, S. (2017). Local Interpretable Model-Agnostic Explanations for Music Content Analysis. In Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR’17), Suzhou, China.
- 29Montgomery, D. C. (2013). Design and Analysis of Experiments. John Wiley and Sons, 8th edition.
- 30Pampalk, E., Flexer, A., & Widmer, G. (2005). Improvements of Audio-Based Similarity and Genre Classification. In Proceedings of the 6th International Conference on Music Information Retrieval (ISMIR’05), pages 628–633, London, UK.
- 31Pearl, J. (2009). Causality: Models, Reasoning, and Inference. Cambridge University Press, 2nd edition. DOI: 10.1017/CBO9780511803161
- 32Pearl, J. (2014). Comment: Understanding Simpson’s Paradox. The American Statistician, 68(1), 8–13. DOI: 10.1080/00031305.2014.876829
- 33Pfungst, O., Stumpf, C., Rahn, C. L., & Angell, J. R. (1911). Clever Hans (the Horse of Mr. von Osten): A Contribution to Experimental, Animal, and Human Psychology. Journal of Philosophy, Psychology and Scientific Methods, 8(24), 663–666. DOI: 10.2307/2012691
- 34Rodríguez-Algarra, F., Sturm, B. L., & Maruri-Aguilar, H. (2016). Analysing Scattering-Based Music Classification Systems: Where’s the Music? In Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR’16), pages 344–350. New York City, NY, USA.
- 35Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Houghton Mifflin Company, Boston, MA, USA.
- 36Simpson, E. H. (1951). The Interpretation of Interaction in Contingency Tables. Journal of the Royal Statistical Society, Series B, 13, 238–241. DOI: 10.1111/j.2517-6161.1951.tb00088.x
- 37Stowell, D. (2017). Reducing Confounding Factors in Automatic Acoustic Recognition of Individual Birds. In Workshop on “Horses” in Applied Machine Learning (HORSE 2017),
http://c4dm.eecs.qmul.ac.uk/horse2017/HORSE2017_Stowell.pdf - 38Sturm, B. L. (2014a). A Simple Method to Determine if a Music Information Retrieval System Is a “Horse”. IEEE Transactions on Multimedia, 16(6), 1636–1644. DOI: 10.1109/TMM.2014.2330697
- 39Sturm, B. L. (2014b). The State of the Art Ten Years After a State of the Art: Future Research in Music Information Retrieval. Journal of New Music Research, 43(2), 147–172. DOI: 10.1080/09298215.2014.894533
- 40Sturm, B. L. (2016a). Revisiting Priorities: Improving MIR Evaluation Practices. In Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR’16), New York City, NY, USA.
- 41Sturm, B. L. (2016b). The “Horse” Inside: Seeking Causes of the Behaviours of Music Content Analysis Systems. Computers in Entertainment, Special Issue on Musical Metacreation, 14(2). DOI: 10.1145/2967507
- 42Trochim, W. M. K., & Donnelly, J. P. (2007). The Research Methods Knowledge Base. Atomic Dog, 3rd edition.
- 43Tzanetakis, G., & Cook, P. (2002). Musical Genre Classification of Audio Signals. IEEE Transactions on Speech and Audio Processing, 10(5), 293–301. DOI: 10.1109/TSA.2002.800560
- 44Urbano, J., Schedl, M., & Serra, X. (2013). Evaluation in Music Information Retrieval. Journal of Intelligent Information Systems, 41(3), 345–369. DOI: 10.1007/s10844-013-0249-4
- 45Weihs, C., Jannach, D., Vatolkin, I., & Rudolph, G. Editors (2017). Music Data Analysis. Foundations and Applications. CRC Press.
