Have a personal or library account? Click to login
Pop Music Highlighter: Marking the Emotion Keypoints Cover
Open Access
|Sep 2018

Abstract

The goal of music highlight extraction, or thumbnailing, is to extract a short consecutive segment of a piece of music that is somehow representative of the whole piece. In a previous work, we introduced an attention-based convolutional recurrent neural network that uses music emotion classification as a surrogate task for music highlight extraction, assuming that the most emotional part of a song usually corresponds to the highlight. This paper extends our previous work in the following two aspects. First, methodology-wise we experiment with a new architecture that does not need any recurrent layers, making the training process faster. Moreover, we compare a late-fusion variant and an early-fusion variant to study which one better exploits the attention mechanism. Second, we conduct and report an extensive set of experiments comparing the proposed attention-based methods to a heuristic energy-based method, a structural repetition-based method, and three other simple feature-based methods, respectively. Due to the lack of public-domain labeled data for highlight extraction, following our previous work we use the RWC-Pop 100-song data set to evaluate how the detected highlights overlap with any chorus sections of the songs. The experiments demonstrate superior effectiveness of our methods over the competing methods. For reproducibility, we share the code and the pre-trained model at https://github.com/remyhuang/pop-music-highlighter/.

DOI: https://doi.org/10.5334/tismir.14 | Journal eISSN: 2514-3298
Language: English
Submitted on: Mar 3, 2018
Accepted on: Jun 3, 2018
Published on: Sep 4, 2018
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2018 Yu-Siang Huang, Szu-Yu Chou, Yi-Hsuan Yang, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.