References
- 1Álvarez-Díaz, M., Muñiz-Bascón, L. M., Soria-Alemany, A., Veintimilla-Bonet, A., and Fernández-Alonso, R. (2020). On the design and validation of a rubric for the evaluation of performance in a musical contest. International Journal of Music Education, 39(1): 66–79. DOI: 10.1177/0255761420936443
- 2Andrich, D. (2004). Controversy and the Rasch model. Medical Care, 42(1): 7–16. DOI: 10.1097/01.mlr.0000103528.48582.7c
- 3Blangiardo, M., and Baio, G. (2014). Evidence of bias in the Eurovision Song Contest: Modelling the votes using Bayesian hierarchical models. Journal of Applied Statistics, 41(10): 2312–22. DOI: 10.1080/02664763.2014.909792
- 4Bond, T. G., Yan, Z., and Heene, M. (2020). Applying the Rasch Model: Fundamental Measurement in the Human Sciences. Routledge, 4th edition. DOI: 10.4324/9780429030499
- 5Bruine de Bruin, W. (2005). Save the last dance for me: Unwanted serial position effects in jury evaluations. Acta Psychologica, 118(3): 245–60. DOI: 10.1016/j.actpsy.2004.08.005
- 6Carnovalini, F., and Rodà, A. (2020). Computational creativity and music generation systems: An introduction to the state of the art. Frontiers in Artifical Intelligence, 3(14). DOI: 10.3389/frai.2020.00014
- 7Downie, J. S. (2004). The scientific evaluation of music information retrieval systems: Foundations and future. Computer Music Journal, 28(2): 12–23. DOI: 10.1162/014892604323112211
- 8Flexer, A., and Grill, T. (2016). The problem of limited inter-rater agreement in modelling music similarity. Journal of New Music Research, 45(3): 239–51. DOI: 10.1080/09298215.2016.1200631
- 9Flôres, R. G.
Jr. , and Ginsburgh, V. A. (1996). The Queen Elisabeth musical competition: How fair is the final ranking? Journal of the Royal Statistical Statistical Society, Series D, 45(1): 97–104. DOI: 10.2307/2348415 - 10Gatherer, D. (2006). Comparison of Eurovision Song Contest simulation with actual results reveals shifting patterns of collusive voting alliances. Journal of Artificial Societies and Social Simulation, 9(2).
- 11Ginsburgh, V., and Noury, A. G. (2008). The Eurovision Song Contest: Is voting political or cultural? European Journal of Political Economy, 24(1): 41–52. DOI: 10.1016/j.ejpoleco.2007.05.004
- 12Glejser, H., and Heyndels, B. (2001). Efficiency and inefficiency in the ranking in competitions: The case of the Queen Elisabeth music contest. Journal of Cultural Economics, 25(2): 109–29. DOI: 10.1023/A:1007659804416
- 13Haan, M. A., Dijkstra, G., and Dijkstra, P. T. (2005). Expert judgment versus public opinion: Evidence from the Eurovision Song Contest. Journal of Cultural Economics, 29(1): 59–78. DOI: 10.1007/s10824-005-6830-0
- 14Huang, C.-Z. A., Koops, H. V., Newton-Rex, E., Dinculescu, M., and Cai, C. (2020). Human–AI cocreation in songwriting. In Proceedings of the 21st International Society for Music Information Retrieval Conference, pages 708–16, Montréal, Québec.
- 15Jordanous, A. (2012). A standardised procedure for evaluating creative systems: Computational creativity evaluation based on what it is to be creative. Cognitive Computation, 4(3): 246–279. DOI: 10.1007/s12559-012-9156-1
- 16Koops, H. V., de Haas, W. B., Burgoyne, J. A., Bransen, J., Kent-Muller, A., and Volk, A. (2019). Annotator subjectivity in harmony annotations of popular music. Journal of New Music Research, 48(3): 232–52. DOI: 10.1080/09298215.2019.1613436
- 17Lambert, D. (1992). Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics, 34(1): 1–14. DOI: 10.2307/1269547
- 18Latimer, M. E., Bergee, M. J., and Cohen, M. L. (2010). Reliability and perceived pedagogical utility of a weighted music performance assessment rubric. Journal of Research in Music Education, 58(2): 168–83. DOI: 10.1177/0022429410369836
- 19Lemoine, N. P. (2019). Moving beyond noninformative priors: Why and how to choose weakly informative priors in Bayesian analyses. Oikos, 128(7): 912–928. DOI: 10.1111/oik.05985
- 20Linacre, J. M. (1989). Many-Facet Rasch Measurement. mesa Press, Chicago.
- 21Linacre, J. M. (2002). Optimizing rating scale category effectiveness. Journal of Applied Measurement, 3(1): 85–106.
- 22Lord, F. M., and Novick, M. R. (1968). Statistical Theories of Mental Test Scores. Addison–Wesley.
- 23Merkle, E. C., Furr, D., and Rabe-Hesketh, S. (2019). Bayesian comparison of latent variable models: Conditional versus marginal likelihoods. Psychometrika, 84(3): 802–89. DOI: 10.1007/s11336-019-09679-0
- 24Nunnally, J. C. (1978). Psychometric Theory. Mc-Graw–Hill, 2nd edition.
- 25Rasch, G. (1960). Probabilistic Models for Some Intelligence and Attainment Tests. Danish Institute for Educational Research, Copenhagen.
- 26Rasch, G. (1977). On specific objectivity: An attempt at formalizing the request for generality and validity of scientific statements. Danish Yearbook of Philosophy, 14: 58–93. DOI: 10.1163/24689300-01401006
- 27Seashore, H. G. (1955). Methods of expressing test scores. Test Service Bulletin, 48: 7–10.
- 28Springer, D. G., and Bradley, K. D. (2017). Investigating adjudicator bias in concert band evaluations: An application of the many-facets Rasch model. Musicae Scientiae, 22(3): 377–93. DOI: 10.1177/1029864917697782
- 29Stan Development Team. (2021). Stan modeling language users guide and reference manual, version 2.26.
https://mc-stan.org . - 30Sturm, B. L. (2016). The ‘horse’ inside: Seeking causes behind the behaviors of music content analysis systems. Computers in Entertainment, 14(2): 1–32. DOI: 10.1145/2967507
- 31Urbano, J., Schedl, M., and Serra, X. (2013). Evaluation inmusic information retrieval. Journal of Intelligent Information Systems, 41(3): 345–369. DOI: 10.1007/s10844-013-0249-4
- 32Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and waic. Statistics and Computing, 27(5): 1413–32. DOI: 10.1007/s11222-016-9696-4
- 33Wesolowski, B. C., Wind, S. A., and Engelhard, G. (2016). Examining rater precision in music performance assessment: An analysis of rating scale structure using the multifaceted Rasch partial credit model. Music Perception, 33(5): 662–78. DOI: 10.1525/mp.2016.33.5.662
- 34Wright, B. D., and Masters, G. N. (1982). Rating Scale Analysis. mesa Press, Chicago.
- 35Wright, B. D., and Mok, M. M. C. (2004).
An overview of the family of Rasch measurement models . In Smith, E., and Smith, R., editors, Introduction to Rasch Measurement, pages 1–24. jam Press, Maple Grove, MN. - 36Yair, G., and Maman, D. (1996). The persistent structure of hegemony in the Eurovision Song Contest. Acta Sociologica, 39(3): 309–25. DOI: 10.1177/000169939603900303
- 37Yang, L.-C., and Lerch, A. (2018). On the evaluation of generative models in music. Neural Computing and Applications, 32(9): 4773–84. DOI: 10.1007/s00521-018-3849-7
