References
- 1Barr, D. R., and Thomas, M. U. (1977). An eigenvector condition for Markov chain lumpability. Operations Research, 25(6):1028–1031. DOI: 10.1287/opre.25.6.1028
- 2Brooks, F. P., Hopkins, A., Neumann, P. G., and Wright, W. V. (1957). An experiment in musical composition. IRE Transactions on Electronic Computers, EC-6(3):175–182. DOI: 10.1109/TEC.1957.5222016
- 3Choromanski, K. M., Likhosherstov, V., Dohan, D., Song, X., Gane, A., Sarlos, T., Hawkins, P., Davis, J. Q., Mohiuddin, A., Kaiser, L., Belanger, D. B., Colwell, L. J., and Weller, A. (2021). Rethinking attention with performers. In 9th International Conference on Learning Representations (ICLR).
- 4Cleary, J., and Witten, I. (1984). Data compression using adaptive coding and partial string matching. IEEE Transactions on Communications, 32(4):396–402. DOI: 10.1109/TCOM.1984.1096090
- 5Conklin, D., and Witten, I. H. (1995). Multiple viewpoint systems for music prediction. Journal of New Music Research, 24(1):51–73. DOI: 10.1080/09298219508570672
- 6Cover, T. M., and Thomas, J. A. (2005). Entropy, Relative Entropy, and Mutual Information, chapter 2, pages 13–55. John Wiley & Sons, Ltd. DOI: 10.1002/047174882X.ch2
- 7Eck, D., and Schmidhuber, J. (2002). Finding temporal structure in music: Blues improvisation with LSTM recurrent networks. In Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing, pages 747–756.
IEEE . DOI: 10.1109/NNSP.2002.1030094 - 8Egermann, H., Pearce, M. T., Wiggins, G. A., and McAdams, S. (2013). Probabilistic models of expectation violation predict psychophysiological emotional responses to live concert music. Cognitive, Affective, & Behavioral Neuroscience, 13(3):533–553. DOI: 10.3758/s13415-013-0161-y
- 9Graves, A., Wayne, G., and Danihelka, I. (2014). Neural Turing machines. CoRR, abs/1410.5401.
- 10Gumbel, E. J. (1935). Les valeurs extremes des distributions statistiques. Annales de l’institut Henri Poincare, 5(2):115–158.
- 11Huang, C. A., Vaswani, A., Uszkoreit, J., Shazeer, N., Hawthorne, C., Dai, A. M., Hoffman, M. D., and Eck, D. (2018). An improved relative self-attention mechanism for transformer with application to music generation. CoRR, abs/1809.04281.
- 12Jang, E., Gu, S., and Poole, B. (2017). Categorical reparameterization with Gumbel-softmax. In 5th International Conference on Learning Representations (ICLR). OpenReview.net.
- 13Kannan, D., Sharpe, D. J., Swinburne, T. D., and Wales, D. J. (2020). Optimal dimensionality reduction of Markov chains using graph transformation. The Journal of Chemical Physics, 153(24):
244108 . DOI: 10.1063/5.0025174 - 14Katehakis, M. N., and Smit, L. C. (2012). A successive lumping procedure for a class of markov chains. Probability in the Engineering and Informational Sciences, 26(4):483–508. DOI: 10.1017/S0269964812000150
- 15Katharopoulos, A., Vyas, A., Pappas, N., and Fleuret, F. (2020). Transformers are RNNs: Fast autoregressive transformers with linear attention. In International Conference on Machine Learning, pages 5156–5165.
PMLR . - 16Kingma, D. P., and Ba, J. (2015). Adam: A method for stochastic optimization. In Bengio, Y. and LeCun, Y., editors, 3rd International Conference on Learning Representations, (ICLR).
- 17Klambauer, G., Unterthiner, T., Mayr, A., and Hochreiter, S. (2017). Self-normalizing neural networks. In Guyon, I., von Luxburg, U., Bengio, S., Wallach, H. M., Fergus, R., Vishwanathan, S. V. N., and Garnett, R., editors, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, pages 971–980.
- 18Lattner, S., Chacon, C. E. C., and Grachten, M. (2015a). Pseudo-supervised training improves unsupervised melody segmentation. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI), pages 2459–2465.
- 19Lattner, S., Grachten, M., Agres, K., and Chacon, C. E. C. (2015b). Probabilistic segmentation of musical sequences using restricted Boltzmann machines. In Proceedings of the 5th International Conference on Mathematics and Computation in Music (MCM), pages 323–334. DOI: 10.1007/978-3-319-20603-5_33
- 20LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., and Jackel, L. D. (1989). Backpropagation applied to handwritten zip code recognition. Neural Computation, 1(4):541–551. DOI: 10.1162/neco.1989.1.4.541
- 21Liutkus, A., Cifka, O., Wu, S.-L., Simsekli, U., Yang, Y.-H., and Richard, G. (2021). Relative positional encoding for transformers with linear complexity. In International Conference on Machine Learning, pages 7067–7079.
PMLR . - 22Maddison, C. J., Mnih, A., and Teh, Y. W. (2017). The concrete distribution: A continuous relaxation of discrete random variables. In 5th International Conference on Learning Representations (ICLR). OpenReview.net.
- 23Maddison, C. J., Tarlow, D., and Minka, T. (2014). A* sampling. In Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N. D., andWeinberger, K. Q., editors, Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems, pages 3086–3094.
- 24Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., and Chintala, S. (2019).
PyTorch: An imperative style, highperformance deep learning library . In Wallach, H., Larochelle, H., Beygelzimer, A., d’Alche-Buc, F., Fox, E., and Garnett, R., editors, Advances in Neural Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc. - 25Pearce, M. (2005). The Construction and Evaluation of Statistical Models of Melodic Structure in Music Perception and Composition. PhD thesis, Department of Computing, City University, London, UK.
- 26Pearce, M., and Mullensiefen, D. (2017). Compressionbased modelling of musical similarity perception. Journal of New Music Research, 46(2):135–155. DOI: 10.1080/09298215.2017.1305419
- 27Pearce, M. T., Mullensiefen, D., and Wiggins, G. A. (2010).
Melodic grouping in music information retrieval: New methods and applications . In Advances in Music Information Retrieval, pages 364–388. Springer. DOI: 10.1007/978-3-642-11674-2_16 - 28Pinkerton, R. C. (1956). Information theory and melody. Scientific American, 194(2):77–87. DOI: 10.1038/scientificamerican0256-77
- 29Quastler, H. (1955).
Discussion, following mathematical theory of word formation, by W. Fucks . In Information Theory: Third London Symposium, volume 168. New York: Academic Press. - 30Roberts, M. G. (1982). Local Order Estimating Markovian Analysis for Noiseless Source Coding and Authorship Identification. PhD thesis, Stanford University.
- 31Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986). Learning representations by backpropagating errors. Nature, 323(6088):533–536. DOI: 10.1038/323533a0
- 32Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., and Lillicrap, T. (2016). Meta-learning with memory-augmented neural networks. In Balcan, M. F. and Weinberger, K. Q., editors, Proceedings of the 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pages 1842–1850, New York, New York, USA.
PMLR . - 33Schlag, I., Irie, K., and Schmidhuber, J. (2021). Linear transformers are secretly fast weight programmers. In Meila, M. and Zhang, T., editors, Proceedings of the 38th International Conference on Machine Learning (ICML), volume 139 of Proceedings of Machine Learning Research, pages 9355–9366.
PMLR . - 34Schmidhuber, J. (1992). Learning to control fastweight memories: An alternative to dynamic recurrent networks. Neural Computation, 4(1):131–139. DOI: 10.1162/neco.1992.4.1.131
- 35Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27(3):379–423. DOI: 10.1002/j.1538-7305.1948.tb01338.x
- 36Shaw, P., Uszkoreit, J., and Vaswani, A. (2018). Selfattention with relative position representations. In Walker, M. A., Ji, H., and Stent, A., editors, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACLHLT), volume 2, pages 464–468.
Association for Computational Linguistics . DOI: 10.18653/v1/N18-2074 - 37Simonyan, K., Vedaldi, A., and Zisserman, A. (2014). Deep inside convolutional networks: Visualising image classification models and saliency maps. In Bengio, Y. and LeCun, Y., editors, 2nd International Conference on Learning Representations (ICLR), Workshop Track Proceedings.
- 38Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1):1929–1958.
- 39Sturm, B. L., Santos, J. F., Ben-Tal, O., and Korshunova, I. (2016). Music transcription modelling and composition using deep learning. In Proceedings of the Conference on Computer Simulation of Musical Creativity.
- 40van den Oord, A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A. W., and Kavukcuoglu, K. (2016).
Wavenet: A generative model for raw audio . In 9th ISCA Speech Synthesis Workshop, page 125. ISCA. - 41Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. u., and Polosukhin, I. (2017).
Attention is all you need . In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R., editors, Advances in Neural Information Processing Systems, volume 30, pages 5998–6008. Curran Associates, Inc. - 42Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., and Wierstra, D. (2016). Matching networks for one shot learning. In Lee, D. D., Sugiyama, M., von Luxburg, U., Guyon, I., and Garnett, R., editors, Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems, pages 3630–3638.
- 43Witten, I., and Bell, T. (1991). The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression. IEEE Transactions on Information Theory, 37:1085–1094. DOI: 10.1109/18.87000
- 44Yang, L., Chou, S., and Yang, Y. (2017). MidiNet: A convolutional generative adversarial network for symbolic-domain music generation. In Cunningham, S. J., Duan, Z., Hu, X., and Turnbull, D., editors, Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR), pages 324–331.
