Have a personal or library account? Click to login
The AI Music Arms Race: On the Detection of AI-Generated Music Cover

The AI Music Arms Race: On the Detection of AI-Generated Music

Open Access
|Jun 2025

References

  1. Afchar, D., Brocal, G. M., and Hennequin, R. (2024). Detecting music deepfakes is easy but actually hard. arXiv:2405.04181.
  2. Agostinelli, A., Denk, T. I., Borsos, Z., Engel, J., Verzetti, M., Caillon, A., Huang, Q., Jansen, A., Roberts, A., Tagliasacchi, M., Kushman, N., Engel, J., Simonyan, K., Norouzi, M., and Anderson, B. (2023). Musiclm: Generating music from text. arXiv:2301.11325.
  3. Bertin‑Mahieux, T., Eck, D., and Mandel, M. (2010). Automatic tagging of audio: The state‑of‑the‑art. In Machine audition: Principles, algorithms and systems. IGI Publishing.
  4. Bertin‑Mahieux, T., Ellis, D. P., Whitman, B., and Lamere, P. (2011). The million song dataset. In Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR), pp. 591596. Miami, Florida.
  5. Boden, M. A. (1998). Creativity and artificial intelligence. Artificial Intelligence, 103(1), 347356.
  6. Bogdanov, D., Wack, N., Gómez, E., Gulati, S., Herrera, P., Mayor, O., Roma, G., Salamon, J., Zapata, J. R., and Serra, X. (2013). Essentia: An audio analysis library for music information retrieval. In Proceedings of the 15th International Society for Music Information Retrieval Conference, pp. 493498. ISMIR.
  7. Choi, K., Fazekas, G., Sandler, M., and Cho, K. (2017). Convolutional recurrent neural networks for music classification. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 23922396. IEEE.
  8. Collingridge, D. (1980). The social control of technology. St. Martin's Press.
  9. Dauban, N. (2024). Our AI‑generated music identification journey. IRCAM Amplify. https://www.ircamamplify.io/blog/ai-generated-music-identification-journey.
  10. European Union. (2019). Directive (EU) 2019/790 of the European Parliament and of the Council of 17 April 2019 on copyright and related rights in the Digital Single Market. Official Journal of the European Union, L, 130, 92125.
  11. Geirhos, R., Jacobsen, J.‑H., Michaelis, C., Zemel, R., Brendel, W., Bethge, M., and Wichmann, F. A. (2020). Shortcut learning in deep neural networks. Nature Machine Intelligence, 2(11), 665673.
  12. Goldmedia. (2024). AI and music: Market development and impact on music authors and creators in Germany and France. GEMA, Sacem, and Gold Media.
  13. Holzapfel, A., Sturm, B. L., and Coeckelbergh, M. (2018). Ethical dimensions of music information retrieval technology. Transactions of the International Society for Music Information Retrieval, 1(1), 4455.
  14. Kaliakatsos‑Papakostas, M. A., Epitropakis, M. G., and Vrahatis, M. N. (2011). Weighted Markov chain model for musical composer identification. In Applications of evolutionary computation (Vol. 6625, pp. 334343). Springer.
  15. Liang, W., Yuksekgonul, M., Mao, Y., Wu, E., and Zou, J. (2023). GPT detectors are biased against non‑native English writers. Patterns, 4(7), Article 100779.
  16. Manaris, B., Romero, J., Machado, P., Krehbiel, D., Hirzel, T., Pharr, W., and Davis, R. B. (2005). Zipf's law, music classification, and aesthetics. Computer Music Journal, 29(1), 5569.
  17. McCormack, J., Llano, M. T., Krol, S. J., and Rajcic, N. (2024). No longer trending on Artstation: Prompt analysis of generative AI art. In International Conference on Computational Intelligence in Music, Sound, Art and Design (Part of EvoStar), pp. 279295. Springer.
  18. McInnes, L., Healy, J., and Melville, J. (2018). UMAP: Uniform manifold approximation and projection for dimension reduction. arXiv:1802.03426.
  19. Miranda, F. M., Köhnecke, N., and Renard, B. Y. (2023). HiClass: A Python library for local hierarchical classification compatible with scikit‑learn. Journal of Machine Learning Research, 24(29), 117.
  20. Müller, M. (2015). Fundamentals of music processing: Audio, analysis, algorithms, applications (Vol. 5). Springer.
  21. Nguyen, T. T., Nguyen, Q. V. H., Nguyen, D. T., Nguyen, D. T., Huynh‑The, T., Nahavandi, S., Nguyen, T. T., Pham, Q.‑V., and Nguyen, C. M. (2022). Deep learning for deepfakes creation and detection: A survey. Computer Vision and Image Understanding, 223, Article 103525.
  22. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit‑learn: Machine learning in Python. Journal of Machine Learning Research, 12, 28252830.
  23. Pelly, L. (2025). Mood machine: The rise of Spotify and the costs of the perfect playlist. Hodder & Stoughton.
  24. Peters, J., Janzing, D., and Schölkopf, B. (2017). Elements of causal inference: Foundations and learning algorithms. MIT Press.
  25. Pons, J., and Serra, X. (2019). musicnn: Pre‑trained convolutional neural networks for music audio tagging. arXiv:1909.06654.
  26. Rahman, M. A., Hakim, Z. I. A., Sarker, N. H., Paul, B., and Fattah, S. A. (2025). SONICS: Synthetic or not ‑ identifying counterfeit songs. In The Thirteenth International Conference on Learning Representations. ICLR.
  27. Ren, H., Li, L., Liu, C.‑H., Wang, X., and Hu, S. (2024). Improving generalization for AI‑synthesized voice detection. arXiv:2412.19279.
  28. Sanchez, T. (2023). Examining the text‑to‑image community of practice: Why and how do people prompt generative AIs? In Proceedings of the 15th Conference on Creativity and Cognition, New York, NY, USA pp. 4361. Association for Computing Machinery.
  29. Silla, C. N., and Freitas, A. A. (2011). A survey of hierarchical classification across different application domains. Data Mining and Knowledge Discovery, 22(1), 3172.
  30. Sturm, B. L. T., Déguernel, K., Huang, R. S., Kaila, A.‑K., Jääskeläinen, P., Kanhov, E., Cros Vila, L., Dalmazzo, D., Casini, L., Bown, O. R., Collins, N., Drott, E., Sterne, J., Holzapfel, A., and Ben‑Tal, O. (2024). AI music studies: Preparing for the coming flood. In AIMC 2024. Oxford, United Kingdom.
  31. Sun, C., Jia, S., Hou, S., and Lyu, S. (2023). AI‑synthesized voice detection using neural vocoder artifacts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 904912. IEEE.
  32. US District Court for the District of Massachusetts. (2024a). UMG Recordings, Inc. v. Suno, Inc. (1:24‑cv‑11611). https://www.courtlistener.com/docket/68878608/umg-recordings-inc-v-suno-inc/.
  33. US District Court for the District of Massachusetts. (2024b). UMG Recordings, Inc. v. Uncharted Labs, Inc. (1:24‑cv‑04777). https://www.courtlistener.com/docket/68878697/umg-recordings-inc-v-uncharted-labs-inc-dba-udiocom/.
  34. Velardo, V. (2022, April 14). The sound of AI (conversations): Valerio Velardo interviews Alex Mitchell [Video]. YouTube. https://youtu.be/iyTJF7b6BwE.
  35. Walters, W. H. (2023). The effectiveness of software designed to detect AI‑generated writing: A comparison of 16 AI text detectors. Open Information Science, 7(1), Article 20220158.
  36. Wang, R., Juefei‑Xu, F., Huang, Y., Guo, Q., Xie, X., Ma, L., and Liu, Y. (2020). DeepSonar: Towards effective and robust detection of AI‑synthesized fake voices. In Proceedings of the 28th ACM International Conference on Multimedia, pp. 12071216. Association for Computing Machinery.
  37. Weber‑Wulff, D., Anohina‑Naumeca, A., Bjelobaba, S., Foltýnek, T., Guerrero‑Dib, J., Popoola, O., Šigut, P., and Waddington, L. (2023). Testing of detection tools for AI‑generated text. International Journal for Educational Integrity, 19(1), Article 26.
  38. Wołkowicz, J., Kulka, Z., and Kešelj, V. (2008). N‑gram‑based approach to composer recognition. Archives of Acoustics, 33(1), 4355.
  39. Wu, Y., Chen, K., Zhang, T., Hui, Y., Berg‑Kirkpatrick, T., and Dubnov, S. (2023). Large‑scale contrastive language‑audio pretraining with feature fusion and keyword‑to‑caption augmentation. In Proceedings of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 15. IEEE.
  40. Zang, Y., Zhang, Y., Heydari, M., and Duan, Z. (2024). SingFake: Singing voice deepfake detection. In ICASSP 2024 – 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1215612160. IEEE.
DOI: https://doi.org/10.5334/tismir.254 | Journal eISSN: 2514-3298
Language: English
Submitted on: Jan 27, 2025
Accepted on: May 22, 2025
Published on: Jun 25, 2025
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2025 Laura Cros Vila, Bob L. T. Sturm, Luca Casini, David Dalmazzo, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.