References
- 1Atrey, P. K., Hossain, M. A., El Saddik, A., & Kankan-halli, M. S. (2010). Multimodal fusion for multimedia analysis: A survey. Multimedia Systems, 16(6), 345–379. DOI: 10.1007/s00530-010-0182-0
- 2Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
- 3Bartsch, M. A., & Wakefield, G. H. (2005). Audio thumbnailing of popular music using chroma-based representations. IEEE Transactions on Multimedia, 7(1), 96–104. DOI: 10.1109/TMM.2004.840597
- 4Bittner, R. M., Gu, M., Hernandez, G., Humphrey, E. J., Jehan, T., McCurry, P. H., & Montecchio, N. (2017). Automatic playlist sequencing and transitions. In: Proceedings of the International Society for Music Information Retrieval Conference, 442–448.
- 5Choi, K., Fazekas, G., Cho, K., & Sandler, M. (2017). A tutorial on deep learning for music information retrieval. arXiv preprint arXiv:1709.04396.
- 6Cooper, M., & Foote, J. (2003). Summarizing popular music via structural similarity analysis. In: Proceedings of the IEEE International Workshop on Applications of Signal Processing to Audio and Acoustics, 127–130. DOI: 10.1109/ASPAA.2003.1285836
- 7Dieleman, S., & Schrauwen, B. (2014). End-to-end learning for music audio. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 6964–6968. DOI: 10.1109/ICASSP.2014.6854950
- 8Eronen, A. (2007). Chorus detection with combined use of MFCC and chroma features and image processing filters. In: Proceedings of the 10th International Conference on Digital Audio Effects, 229–236.
- 9Gehring, J., Auli, M., Grangier, D., Yarats, D., & Dauphin, Y. N. (2017). Convolutional sequence to sequence learning. arXiv preprint arXiv:1705.03122.
- 10Goto, M. (2003). SmartMusicIOKSK: Music listening station with chorus-search function. In: Proceedings of the ACM Symposium on User Interface Software and Technology, 31–40.
- 11Goto, M. (2006). A chorus section detection method for musical audio signals and its application to a music listening station. IEEE Transactions on Audio, Speech, and Language Processing, 14(5), 1783–1794. DOI: 10.1109/TSA.2005.863204
- 12Goto, M., Hashiguchi, H., Nishimura, T., & Oka, R. (2002). RWC Music Database: Popular, classical and jazz music databases. In: Proceedings of the International Conference on Music Information Retrieval, 287–288.
- 13Grosche, P., Müller, M., & Kurth, F. (2010). Cyclic tempogram – a mid-level tempo representation for music signals. In: IEEE International Conference on Acoustics Speech and Signal Processing, 5522–5525.
- 14Ha, J.-W., Kim, A., Kim, C., Park, J., & Kim, S. (2017). Automatic music highlight extraction using convolutional recurrent attention networks. arXiv preprint arXiv:1712.05901.
- 15Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780. DOI: 10.1162/neco.1997.9.8.1735
- 16Huang, Y.-S., Chou, S.-Y., & Yang, Y.-H. (2017a). DJnet: A dream for making an automatic DJ. In International Society for Music Information Retrieval Conference, Late-Breaking Paper, 1–2.
- 17Huang, Y.-S., Chou, S.-Y., & Yang, Y.-H. (2017b). Music thumbnailing via neural attention modeling of music emotion. In: Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 347–350.
- 18Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Proceedings of the International Conference on Machine Learning, 448–456.
- 19Lee, J.-Y., Kim, J.-Y., & Kim, H.-G. (2014). Music emotion classification based on music highlight detection. In: Proceedings of the IEEE International Conference on Information Science and Applications, 1–2.
- 20Liu, J.-Y., & Yang, Y.-H. (2016). Event localization in music auto-tagging. In: Proceedings of ACM Multimedia, 1048–1057. DOI: 10.1145/2964284.2964292
- 21Logan, B., & Chu, S. (2000). Music summarization using key phrases. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 749–752. DOI: 10.1109/ICASSP.2000.859068
- 22McFee, B., Raffel, C., Liang, D., Ellis, D. P., McVicar, M., Battenberg, E., & Nieto, O. (2015). LibROSA: Audio and music signal analysis in Python. In: Proceedings of the Python in Science Conference, 18–25.
- 23Mehrabi, A., Harte, C., Baume, C., & Dixon, S. (2017). Music thumbnailing for radio podcasts: A listener evaluation. Journal of the Audio Engineering Society, 65(6), 474–481. DOI: 10.17743/jaes.2017.0011
- 24Meintanis, K. A., & Shipman, F. M.,
III . (2008). Creating and evaluating multi-phrase music summaries. In: Proceedings of the International Conference on Music Information Retrieval, 507–512. - 25Müller, M., & Ewert, S. (2011). Chroma Toolbox: MATLAB implementations for extracting variants of chroma-based audio features. In: Proceedings of the International Society for Music Information Retrieval Conference, 215–220.
- 26Nieto, O., & Bello, J. P. (2014). Music segment similarity using 2D-Fourier magnitude coefficients. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 664–668.
- 27Nieto, O., & Bello, J. P. (2016). Systematic exploration of computational music structure research. In: Proceedings of the International Society for Music Information Retrieval Conference, 547–553.
- 28Panda, R., Malheiro, R. M., & Paiva, R. P. (2018). Novel audio features for music emotion recognition. IEEE Transactions on Affective Computing. DOI: 10.1109/TAFFC.2018.2820691
- 29Peeters, G., La Burthe, A., & Rodet, X. (2002). Toward automatic music audio summary generation from signal analysis. In: Proceedings of the International Conference on Music Information Retrieval, 94–100.
- 30Serra, J., Müller, M., Grosche, P., & Arcos, J. L. (2014). Unsupervised music structure annotation by time series structure features and segment similarity. IEEE Transactions on Multimedia, 16(5), 1229–1240. DOI: 10.1109/TMM.2014.2310701
- 31Shen, T., Zhou, T., Long, G., Jiang, J., Pan, S., & Zhang, C. (2017). DiSAN: Directional self-attention network for RNN/CNN-free language understanding. arXiv preprint arXiv:1709.04696.
- 32Tzanetakis, G., & Cook, P. (2002). Musical genre classification of audio signals. IEEE Transactions on Speech and Audio Processing, 10(5), 293–302. DOI: 10.1109/TSA.2002.800560
- 33van Balen, J., Burgoyne, J. A., Wiering, F., & Veltkamp, R. C. (2013). An analysis of chorus features in popular song. In: Proceedings of the International Society for Music Information Retrieval Conference, 107–112.
- 34van den Oord, A., Dieleman, S., & Schrauwen, B. (2013). Deep content-based music recommendation. In: Advances in Neural Information Processing Systems, 2643–2651.
- 35Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. arXiv preprint arXiv:1706.03762.
- 36Wang, X., Wu, Y., Chen, X., & Yang, D. (2013). Enhance popular music emotion regression by importing structure information. In: Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 1–4.
- 37Yadati, K., Larson, M., Liem, C. C., & Hanjalic, A. (2014). Detecting drops in electronic dance music: Content based approaches to a socially significant music event. In: Proceedings of the International Society for Music Information Retrieval Conference, 143–148.
- 38Yang, Y.-H., & Liu, J.-Y. (2013). Quantitative study of music listening behavior in a social and affective context. IEEE Transactions on Multimedia, 15(6), 1304–1315. DOI: 10.1109/TMM.2013.2265078
