Differentiable Short-Term Models for Efficient Online Learning and Prediction in Monophonic Music

Mathias Rose Bjare; Stefan Lattner; Gerhard Widmer

doi:10.5334/tismir.123

References

1Barr, D. R., and Thomas, M. U. (1977). An eigenvector condition for Markov chain lumpability. Operations Research, 25(6):1028–1031. DOI: 10.1287/opre.25.6.1028
Back to article
2Brooks, F. P., Hopkins, A., Neumann, P. G., and Wright, W. V. (1957). An experiment in musical composition. IRE Transactions on Electronic Computers, EC-6(3):175–182. DOI: 10.1109/TEC.1957.5222016
Back to article
3Choromanski, K. M., Likhosherstov, V., Dohan, D., Song, X., Gane, A., Sarlos, T., Hawkins, P., Davis, J. Q., Mohiuddin, A., Kaiser, L., Belanger, D. B., Colwell, L. J., and Weller, A. (2021). Rethinking attention with performers. In 9th International Conference on Learning Representations (ICLR).
Back to article
4Cleary, J., and Witten, I. (1984). Data compression using adaptive coding and partial string matching. IEEE Transactions on Communications, 32(4):396–402. DOI: 10.1109/TCOM.1984.1096090
Back to article
5Conklin, D., and Witten, I. H. (1995). Multiple viewpoint systems for music prediction. Journal of New Music Research, 24(1):51–73. DOI: 10.1080/09298219508570672
Back to article
6Cover, T. M., and Thomas, J. A. (2005). Entropy, Relative Entropy, and Mutual Information, chapter 2, pages 13–55. John Wiley & Sons, Ltd. DOI: 10.1002/047174882X.ch2
Back to article
7Eck, D., and Schmidhuber, J. (2002). Finding temporal structure in music: Blues improvisation with LSTM recurrent networks. In Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing, pages 747–756. IEEE. DOI: 10.1109/NNSP.2002.1030094
Back to article
8Egermann, H., Pearce, M. T., Wiggins, G. A., and McAdams, S. (2013). Probabilistic models of expectation violation predict psychophysiological emotional responses to live concert music. Cognitive, Affective, & Behavioral Neuroscience, 13(3):533–553. DOI: 10.3758/s13415-013-0161-y
Back to article
9Graves, A., Wayne, G., and Danihelka, I. (2014). Neural Turing machines. CoRR, abs/1410.5401.
Back to article
10Gumbel, E. J. (1935). Les valeurs extremes des distributions statistiques. Annales de l’institut Henri Poincare, 5(2):115–158.
Back to article
11Huang, C. A., Vaswani, A., Uszkoreit, J., Shazeer, N., Hawthorne, C., Dai, A. M., Hoffman, M. D., and Eck, D. (2018). An improved relative self-attention mechanism for transformer with application to music generation. CoRR, abs/1809.04281.
Back to article
12Jang, E., Gu, S., and Poole, B. (2017). Categorical reparameterization with Gumbel-softmax. In 5th International Conference on Learning Representations (ICLR). OpenReview.net.
Back to article
13Kannan, D., Sharpe, D. J., Swinburne, T. D., and Wales, D. J. (2020). Optimal dimensionality reduction of Markov chains using graph transformation. The Journal of Chemical Physics, 153(24):244108. DOI: 10.1063/5.0025174
Back to article
14Katehakis, M. N., and Smit, L. C. (2012). A successive lumping procedure for a class of markov chains. Probability in the Engineering and Informational Sciences, 26(4):483–508. DOI: 10.1017/S0269964812000150
Back to article
15Katharopoulos, A., Vyas, A., Pappas, N., and Fleuret, F. (2020). Transformers are RNNs: Fast autoregressive transformers with linear attention. In International Conference on Machine Learning, pages 5156–5165. PMLR.
Back to article
16Kingma, D. P., and Ba, J. (2015). Adam: A method for stochastic optimization. In Bengio, Y. and LeCun, Y., editors, 3rd International Conference on Learning Representations, (ICLR).
Back to article
17Klambauer, G., Unterthiner, T., Mayr, A., and Hochreiter, S. (2017). Self-normalizing neural networks. In Guyon, I., von Luxburg, U., Bengio, S., Wallach, H. M., Fergus, R., Vishwanathan, S. V. N., and Garnett, R., editors, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, pages 971–980.
Back to article
18Lattner, S., Chacon, C. E. C., and Grachten, M. (2015a). Pseudo-supervised training improves unsupervised melody segmentation. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI), pages 2459–2465.
Back to article
19Lattner, S., Grachten, M., Agres, K., and Chacon, C. E. C. (2015b). Probabilistic segmentation of musical sequences using restricted Boltzmann machines. In Proceedings of the 5th International Conference on Mathematics and Computation in Music (MCM), pages 323–334. DOI: 10.1007/978-3-319-20603-5_33
Back to article
20LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., and Jackel, L. D. (1989). Backpropagation applied to handwritten zip code recognition. Neural Computation, 1(4):541–551. DOI: 10.1162/neco.1989.1.4.541
Back to article
21Liutkus, A., Cifka, O., Wu, S.-L., Simsekli, U., Yang, Y.-H., and Richard, G. (2021). Relative positional encoding for transformers with linear complexity. In International Conference on Machine Learning, pages 7067–7079. PMLR.
Back to article
22Maddison, C. J., Mnih, A., and Teh, Y. W. (2017). The concrete distribution: A continuous relaxation of discrete random variables. In 5th International Conference on Learning Representations (ICLR). OpenReview.net.
Back to article
23Maddison, C. J., Tarlow, D., and Minka, T. (2014). A* sampling. In Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N. D., andWeinberger, K. Q., editors, Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems, pages 3086–3094.
Back to article
24Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., and Chintala, S. (2019). PyTorch: An imperative style, highperformance deep learning library. In Wallach, H., Larochelle, H., Beygelzimer, A., d’Alche-Buc, F., Fox, E., and Garnett, R., editors, Advances in Neural Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc.
Back to article
25Pearce, M. (2005). The Construction and Evaluation of Statistical Models of Melodic Structure in Music Perception and Composition. PhD thesis, Department of Computing, City University, London, UK.
Back to article
26Pearce, M., and Mullensiefen, D. (2017). Compressionbased modelling of musical similarity perception. Journal of New Music Research, 46(2):135–155. DOI: 10.1080/09298215.2017.1305419
Back to article
27Pearce, M. T., Mullensiefen, D., and Wiggins, G. A. (2010). Melodic grouping in music information retrieval: New methods and applications. In Advances in Music Information Retrieval, pages 364–388. Springer. DOI: 10.1007/978-3-642-11674-2_16
Back to article
28Pinkerton, R. C. (1956). Information theory and melody. Scientific American, 194(2):77–87. DOI: 10.1038/scientificamerican0256-77
Back to article
29Quastler, H. (1955). Discussion, following mathematical theory of word formation, by W. Fucks. In Information Theory: Third London Symposium, volume 168. New York: Academic Press.
Back to article
30Roberts, M. G. (1982). Local Order Estimating Markovian Analysis for Noiseless Source Coding and Authorship Identification. PhD thesis, Stanford University.
Back to article
31Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986). Learning representations by backpropagating errors. Nature, 323(6088):533–536. DOI: 10.1038/323533a0
Back to article
32Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., and Lillicrap, T. (2016). Meta-learning with memory-augmented neural networks. In Balcan, M. F. and Weinberger, K. Q., editors, Proceedings of the 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pages 1842–1850, New York, New York, USA. PMLR.
Back to article
33Schlag, I., Irie, K., and Schmidhuber, J. (2021). Linear transformers are secretly fast weight programmers. In Meila, M. and Zhang, T., editors, Proceedings of the 38th International Conference on Machine Learning (ICML), volume 139 of Proceedings of Machine Learning Research, pages 9355–9366. PMLR.
Back to article
34Schmidhuber, J. (1992). Learning to control fastweight memories: An alternative to dynamic recurrent networks. Neural Computation, 4(1):131–139. DOI: 10.1162/neco.1992.4.1.131
Back to article
35Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27(3):379–423. DOI: 10.1002/j.1538-7305.1948.tb01338.x
Back to article
36Shaw, P., Uszkoreit, J., and Vaswani, A. (2018). Selfattention with relative position representations. In Walker, M. A., Ji, H., and Stent, A., editors, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACLHLT), volume 2, pages 464–468. Association for Computational Linguistics. DOI: 10.18653/v1/N18-2074
Back to article
37Simonyan, K., Vedaldi, A., and Zisserman, A. (2014). Deep inside convolutional networks: Visualising image classification models and saliency maps. In Bengio, Y. and LeCun, Y., editors, 2nd International Conference on Learning Representations (ICLR), Workshop Track Proceedings.
Back to article
38Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1):1929–1958.
Back to article
39Sturm, B. L., Santos, J. F., Ben-Tal, O., and Korshunova, I. (2016). Music transcription modelling and composition using deep learning. In Proceedings of the Conference on Computer Simulation of Musical Creativity.
Back to article
40van den Oord, A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A. W., and Kavukcuoglu, K. (2016). Wavenet: A generative model for raw audio. In 9th ISCA Speech Synthesis Workshop, page 125. ISCA.
Back to article
41Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. u., and Polosukhin, I. (2017). Attention is all you need. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R., editors, Advances in Neural Information Processing Systems, volume 30, pages 5998–6008. Curran Associates, Inc.
Back to article
42Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., and Wierstra, D. (2016). Matching networks for one shot learning. In Lee, D. D., Sugiyama, M., von Luxburg, U., Guyon, I., and Garnett, R., editors, Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems, pages 3630–3638.
Back to article
43Witten, I., and Bell, T. (1991). The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression. IEEE Transactions on Information Theory, 37:1085–1094. DOI: 10.1109/18.87000
Back to article
44Yang, L., Chou, S., and Yang, Y. (2017). MidiNet: A convolutional generative adversarial network for symbolic-domain music generation. In Cunningham, S. J., Duan, Z., Hu, X., and Turnbull, D., editors, Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR), pages 324–331.
Back to article

Differentiable Short-Term Models for Efficient Online Learning and Prediction in Monophonic Music

References

Paradigm

My account