Have a personal or library account? Click to login
We are Not Groupies… We are Band Aids’: Assessment Reliability in the AI Song Contest Cover

We are Not Groupies… We are Band Aids’: Assessment Reliability in the AI Song Contest

Open Access
|Dec 2021

References

  1. 1Álvarez-Díaz, M., Muñiz-Bascón, L. M., Soria-Alemany, A., Veintimilla-Bonet, A., and Fernández-Alonso, R. (2020). On the design and validation of a rubric for the evaluation of performance in a musical contest. International Journal of Music Education, 39(1): 6679. DOI: 10.1177/0255761420936443
  2. 2Andrich, D. (2004). Controversy and the Rasch model. Medical Care, 42(1): 716. DOI: 10.1097/01.mlr.0000103528.48582.7c
  3. 3Blangiardo, M., and Baio, G. (2014). Evidence of bias in the Eurovision Song Contest: Modelling the votes using Bayesian hierarchical models. Journal of Applied Statistics, 41(10): 231222. DOI: 10.1080/02664763.2014.909792
  4. 4Bond, T. G., Yan, Z., and Heene, M. (2020). Applying the Rasch Model: Fundamental Measurement in the Human Sciences. Routledge, 4th edition. DOI: 10.4324/9780429030499
  5. 5Bruine de Bruin, W. (2005). Save the last dance for me: Unwanted serial position effects in jury evaluations. Acta Psychologica, 118(3): 24560. DOI: 10.1016/j.actpsy.2004.08.005
  6. 6Carnovalini, F., and Rodà, A. (2020). Computational creativity and music generation systems: An introduction to the state of the art. Frontiers in Artifical Intelligence, 3(14). DOI: 10.3389/frai.2020.00014
  7. 7Downie, J. S. (2004). The scientific evaluation of music information retrieval systems: Foundations and future. Computer Music Journal, 28(2): 1223. DOI: 10.1162/014892604323112211
  8. 8Flexer, A., and Grill, T. (2016). The problem of limited inter-rater agreement in modelling music similarity. Journal of New Music Research, 45(3): 23951. DOI: 10.1080/09298215.2016.1200631
  9. 9Flôres, R. G. Jr., and Ginsburgh, V. A. (1996). The Queen Elisabeth musical competition: How fair is the final ranking? Journal of the Royal Statistical Statistical Society, Series D, 45(1): 97104. DOI: 10.2307/2348415
  10. 10Gatherer, D. (2006). Comparison of Eurovision Song Contest simulation with actual results reveals shifting patterns of collusive voting alliances. Journal of Artificial Societies and Social Simulation, 9(2).
  11. 11Ginsburgh, V., and Noury, A. G. (2008). The Eurovision Song Contest: Is voting political or cultural? European Journal of Political Economy, 24(1): 4152. DOI: 10.1016/j.ejpoleco.2007.05.004
  12. 12Glejser, H., and Heyndels, B. (2001). Efficiency and inefficiency in the ranking in competitions: The case of the Queen Elisabeth music contest. Journal of Cultural Economics, 25(2): 10929. DOI: 10.1023/A:1007659804416
  13. 13Haan, M. A., Dijkstra, G., and Dijkstra, P. T. (2005). Expert judgment versus public opinion: Evidence from the Eurovision Song Contest. Journal of Cultural Economics, 29(1): 5978. DOI: 10.1007/s10824-005-6830-0
  14. 14Huang, C.-Z. A., Koops, H. V., Newton-Rex, E., Dinculescu, M., and Cai, C. (2020). Human–AI cocreation in songwriting. In Proceedings of the 21st International Society for Music Information Retrieval Conference, pages 70816, Montréal, Québec.
  15. 15Jordanous, A. (2012). A standardised procedure for evaluating creative systems: Computational creativity evaluation based on what it is to be creative. Cognitive Computation, 4(3): 246279. DOI: 10.1007/s12559-012-9156-1
  16. 16Koops, H. V., de Haas, W. B., Burgoyne, J. A., Bransen, J., Kent-Muller, A., and Volk, A. (2019). Annotator subjectivity in harmony annotations of popular music. Journal of New Music Research, 48(3): 23252. DOI: 10.1080/09298215.2019.1613436
  17. 17Lambert, D. (1992). Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics, 34(1): 114. DOI: 10.2307/1269547
  18. 18Latimer, M. E., Bergee, M. J., and Cohen, M. L. (2010). Reliability and perceived pedagogical utility of a weighted music performance assessment rubric. Journal of Research in Music Education, 58(2): 16883. DOI: 10.1177/0022429410369836
  19. 19Lemoine, N. P. (2019). Moving beyond noninformative priors: Why and how to choose weakly informative priors in Bayesian analyses. Oikos, 128(7): 912928. DOI: 10.1111/oik.05985
  20. 20Linacre, J. M. (1989). Many-Facet Rasch Measurement. mesa Press, Chicago.
  21. 21Linacre, J. M. (2002). Optimizing rating scale category effectiveness. Journal of Applied Measurement, 3(1): 85106.
  22. 22Lord, F. M., and Novick, M. R. (1968). Statistical Theories of Mental Test Scores. Addison–Wesley.
  23. 23Merkle, E. C., Furr, D., and Rabe-Hesketh, S. (2019). Bayesian comparison of latent variable models: Conditional versus marginal likelihoods. Psychometrika, 84(3): 80289. DOI: 10.1007/s11336-019-09679-0
  24. 24Nunnally, J. C. (1978). Psychometric Theory. Mc-Graw–Hill, 2nd edition.
  25. 25Rasch, G. (1960). Probabilistic Models for Some Intelligence and Attainment Tests. Danish Institute for Educational Research, Copenhagen.
  26. 26Rasch, G. (1977). On specific objectivity: An attempt at formalizing the request for generality and validity of scientific statements. Danish Yearbook of Philosophy, 14: 5893. DOI: 10.1163/24689300-01401006
  27. 27Seashore, H. G. (1955). Methods of expressing test scores. Test Service Bulletin, 48: 710.
  28. 28Springer, D. G., and Bradley, K. D. (2017). Investigating adjudicator bias in concert band evaluations: An application of the many-facets Rasch model. Musicae Scientiae, 22(3): 37793. DOI: 10.1177/1029864917697782
  29. 29Stan Development Team. (2021). Stan modeling language users guide and reference manual, version 2.26. https://mc-stan.org.
  30. 30Sturm, B. L. (2016). The ‘horse’ inside: Seeking causes behind the behaviors of music content analysis systems. Computers in Entertainment, 14(2): 132. DOI: 10.1145/2967507
  31. 31Urbano, J., Schedl, M., and Serra, X. (2013). Evaluation inmusic information retrieval. Journal of Intelligent Information Systems, 41(3): 345369. DOI: 10.1007/s10844-013-0249-4
  32. 32Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and waic. Statistics and Computing, 27(5): 141332. DOI: 10.1007/s11222-016-9696-4
  33. 33Wesolowski, B. C., Wind, S. A., and Engelhard, G. (2016). Examining rater precision in music performance assessment: An analysis of rating scale structure using the multifaceted Rasch partial credit model. Music Perception, 33(5): 66278. DOI: 10.1525/mp.2016.33.5.662
  34. 34Wright, B. D., and Masters, G. N. (1982). Rating Scale Analysis. mesa Press, Chicago.
  35. 35Wright, B. D., and Mok, M. M. C. (2004). An overview of the family of Rasch measurement models. In Smith, E., and Smith, R., editors, Introduction to Rasch Measurement, pages 124. jam Press, Maple Grove, MN.
  36. 36Yair, G., and Maman, D. (1996). The persistent structure of hegemony in the Eurovision Song Contest. Acta Sociologica, 39(3): 30925. DOI: 10.1177/000169939603900303
  37. 37Yang, L.-C., and Lerch, A. (2018). On the evaluation of generative models in music. Neural Computing and Applications, 32(9): 477384. DOI: 10.1007/s00521-018-3849-7
DOI: https://doi.org/10.5334/tismir.102 | Journal eISSN: 2514-3298
Language: English
Submitted on: Mar 1, 2021
Accepted on: Jul 5, 2021
Published on: Dec 3, 2021
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2021 John Ashley Burgoyne, Hendrik Vincent Koops, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.