Abstract
Several companies now offer platforms for users to create music at unprecedented scales by textual prompting. As the quality of this music rises, concern grows about how to differentiate AI‑generated music from human‑made music, with implications for content identification, copyright enforcement, and music recommendation systems. This article explores the detection of AI‑generated music by assembling and studying a large dataset of music audio recordings (30,000 full tracks totaling 1,770 h, 33 m, and 31 s in duration), of which 10,000 are from the Million Song Dataset (Bertin‑Mahieux et al., 2011) and 20,000 are generated and released by users of two popular AI music platforms: Suno and Udio. We build and evaluate several AI music detectors operating on Contrastive Language–Audio Pretraining embeddings of the music audio, then compare them to a commercial baseline system as well as an open‑source one. We applied various audio transformations to see their impacts on detector performance and found that the commercial baseline system is easily fooled by simply resampling audio to 22.05 kHz. We argue that careful consideration needs to be given to the experimental design underlying work in this area, as well as the very definition of ‘AI music.’ We release all our code at https://github.com/lcrosvila/ai-music-detection.
