The Potential of Unsupervised Induction of Harmonic Syntax for Jazz

Ruben Cartuyvels; John Koslovsky; Marie-Francine Moens

doi:10.5334/tismir.217

Abstract

Hierarchical structures describing a syntax of harmony have long been studied and proposed by music theorists, but algorithms that model these structures either require costly expert annotations for training or are based on music theorists' predispositions about harmonic syntax. We build upon a line of work that models harmonic sequences with probabilistic context-free grammars (PCFGs), inspired by the well-known formalism for syntax in human language. By using neural networks for parameter sharing when estimating PCFG rule probabilities, we learn the grammar in an entirely unsupervised manner. Our model induces a harmonic syntax purely from data, with minimal bias, and with parse trees as latent variables, while simply maximizing the likelihood of training sequences. This frees us from the need, for the first time, both for expert-annotated harmonic syntax trees and for human-defined grammar rules. We propose improvements inspired by music theory, including chord symbol representations and a training objective that facilitates the inclusion of short and frequent chord progressions that are based on musical relations. Experiments show that our methods can model harmony in datasets of jazz pieces, often resulting in realistic parse trees that overlap with expert annotations, without access to these annotations during training at all. Code, models, and predictions are publicly available.¹

References

Baker, J. K. (1979). Trainable grammars for speech recognition. The Journal of the Acoustical Society of America, 65(S1), S132–S132.
Back to article
Broze, Y., and Shanahan, D. (2013). Diachronic changes in jazz harmony: A cognitive perspective. Music Perception: An Interdisciplinary Journal, 31(1), 32–45.
Back to article
de Berardinis, J., Meroño‑Peñuela, A., Poltronieri, A., and Presutti, V. (2023). Choco: A chord corpus and a data transformation workflow for musical harmony knowledge graphs. Scientific Data, 10(1), 641.
Back to article
Drozdov, A., Verga, P., Yadav, M., Iyyer, M., and McCallum, A. (2019). Unsupervised latent tree induction with deep inside‑outside recursive auto‑encoders. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL‑HLT, pp. 1129–1141.
Back to article
Eisner, J. (2016). Inside‑outside and forward‑backward algorithms are just backprop (tutorial paper). In Proceedings of the Workshop on Structured Prediction for NLP@EMNLP, pp. 1–17. ACL.
Back to article
Eremenko, V., Demirel, E., Bozkurt, B., and Serra, X. (2018). Audio‑aligned jazz harmony dataset for automatic chord transcription and corpus‑based research. In Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), pp. 483–490.
Back to article
Foscarin, F., Harasim, D., and Widmer, G. (2023). Predicting music hierarchies with a graph‑based neural decoder. In Proceedings of the 24th International Society for Music Information Retrieval Conference (ISMIR), pp. 425–432.
Back to article
Goodman, J. (1996). Parsing algorithms and metrics. In 34th Annual Meeting of the Association for Computational Linguistics, pp. 177–183. Morgan Kaufmann Publishers/ ACL.
Back to article
Granroth‑Wilding, M., and Steedman, M. (2012). Statistical parsing for harmonic analysis of jazz chord sequences. In Non‑Cochlear Sound: Proceedings of the 38th International Computer Music Conference (ICMC). Michigan Publishing.
Back to article
Granroth‑Wilding, M., and Steedman, M. (2014). A robust parser‑interpreter for jazz chord sequences. Journal of New Music Research, 43(4), 355–374.
Back to article
Haas, B. (2004). Die neue Tonalität von Schubert bis Webern. Hören und Analysieren nach Albert Simon. Noetzel.
Back to article
Hamanaka, M., Hirata, K., and Tojo, S. (2014). Musical structural analysis database based on GTTM. In Proceedings of the 15th Conference of the International Society for Music Information Retrieval (ISMIR), pp. 325–330. ISMIR.
Back to article
Harasim, D. (2020). The learnability of the grammar of jazz: Bayesian inference of hierarchical structures in harmony. Doctoral dissertation, EPFL.
Back to article
Harasim, D., Finkensiep, C., Ericson, P., O’Donnell, T. J., and Rohrmeier, M. (2020). The jazz harmony treebank. In Proceedings of the 21th International Society for Music Information Retrieval Conference (ISMIR), pp. 207–215.
Back to article
Harasim, D., Rohrmeier, M., and O’Donnell, T. J. (2018). A generalized parsing framework for generative models of harmonic syntax. In Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), pp. 152–159.
Back to article
Herff, S. A., Bonetti, L., Cecchetti, G., Vuust, P., Kringelbach, M. L., and Rohrmeier, M. A. (2024). Hierarchical syntax model of music predicts theta power during music listening. Neuropsychologia, 199, 108905.
Back to article
Herff, S. A., Harasim, D., Cecchetti, G., Finkensiep, C., and Rohrmeier, M. A. (2021). Hierarchical syntactic structure predicts listeners’ sequence completion in music. In Proceedings of the 43rd Annual Meeting of the Cognitive Science Society, CogSci (Vol. 43, pp. 903–909). Cognitive Science Society.
Back to article
Jurafsky, D., and Martin, J. H. (2024). Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition with language models (3rd ed.). https://web.stanford.edu/~jurafsky/slp3. Online manuscript released August 20, 2024.
Back to article
Katz, J., and Pesetsky, D. (2011). The identity thesis for language and music.
Back to article
Kim, Y., Dyer, C., & Rush, A. M. (2019). Compound probabilistic context‑free grammars for grammar induction. In Proceedings of the 57th Conference of the Association for Computational Linguistics (ACL), pp. 2369–2385. ACL.
Back to article
Kirlin, P. B., and Jensen, D. D. (2011). Probabilistic modeling of hierarchical music analysis. In Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR), pp. 393–398. University of Miami.
Back to article
Lazzari, N., Poltronieri, A., and Presutti, V. (2022). Pitchclass2vec: Symbolic music structure segmentation with chord embeddings. In Proceedings of the 1st Workshop on Artificial Intelligence and Creativity, volume 3278 of CEUR Workshop Proceedings, pp. 14–30. CEUR‑WS.org
Back to article
Lazzari, N., Presutti, V., and Poltronieri, D. A. (2023). Knowledge‑based chord embeddings. Doctoral dissertation, Università di Bologna.
Back to article
Lerdahl, F., and Jackendoff, R. S. (1983). A generative theory of tonal music. MIT Press.
Back to article
Liu, W., Yang, S., Kim, Y., and Tu, K. (2023). Simple hardware‑efficient PCFGs with independent left and right productions. In Findings of the Association for Computational Linguistics: EMNLP, pp. 1662–1669. ACL.
Back to article
Manning, C. D., and Schütze, H. (2001). Foundations of statistical natural language processing. MIT Press.
Back to article
Mauch, M., Dixon, S., Harte, C., Casey, M. A., and Fields, B. (2007). Discovering chord idioms through Beatles and Real Book songs. In Proceedings of the 8th International Conference on Music Information Retrieval (ISMIR), pp. 255–258. Austrian Computer Society.
Back to article
Melkonian, O. (2019). Music as language: Putting probabilistic temporal graph grammars to good use. In Proceedings of the 7th ACM SIGPLAN International Workshop on Functional Art, Music, Modeling, and Design, FARM@ICFP, pp. 1–10. ACM.
Back to article
Ogura, Y., Ohmura, H., Uehara, Y., Tojo, S., and Katsurada, K. (2020). Expectation‑based parsing for jazz chord sequences. In 17th Sound and Music Computing Conference, SMC 2020, pp. 350–356. CERN.
Back to article
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Köpf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L. … Chintala, S. (2019). PyTorch: An imperative style, high‑performance deep learning library. In Advances in Neural Information Processing Systems 32, NeurIPS (pp. 8024–8035).
Back to article
Patel, A. D. (2003). Language, music, syntax and the brain. Nature Neuroscience, 6(7), 674–681.
Back to article
Rohrmeier, M. (2011). Towards a generative syntax of tonal harmony. Journal of Mathematics and Music, 5(1), 35–53.
Back to article
Rohrmeier, M. (2020). The syntax of jazz harmony: Diatonic tonality, phrase structure, and form. Music Theory and Analysis (MTA), 7(1), 1–63.
Back to article
Rohrmeier, M., and Moss, F. C. (2021). A formal model of extended tonal harmony. In Proceedings of the 22nd International Society for Music Information Retrieval Conference (ISMIR), pp. 569–578.
Back to article
Rohrmeier, M., and Neuwirth, M. (2015). Towards a syntax of the classical cadence. In What is a Cadence? (pp. 287–338). Leuven University Press.
Back to article
Sakai, I. (1961). Syntax in universal translation. In Proceedings of the International Conference on Machine Translation and Applied Language Analysis, Teddington, UK. National Physical Laboratory.
Back to article
Schenker, H. (1979). Free composition (E. Oster, Trans.). Longman. (Original work published 1935)
Back to article
Shanahan, D., Broze, Y., and Rodgers, R. (2012). A diachronic analysis of harmonic schemata in jazz. In Proceedings of the 12th International Conference on Music Perception and Cognition and the 8th Triennial Conference of the European Society for the Cognitive Sciences of Music, pp. 909–917.
Back to article
Simon, A. (1983). Béla Bartók: Secondes mineures–septièmes majeures (Mikrokosmos, VI/144). Schweizerische Musikzeitung/Revue Musicale Suisse, 123, 82–86.
Back to article
Steedman, M. J. (1984). A generative grammar for jazz chord sequences. Music Perception, 2(1), 52–77.
Back to article
Terefenko, D. (2014). Jazz theory: From basic to advanced study. Routledge.
Back to article
Winograd, T. (1968). Linguistics and the computer analysis of tonal harmony. Journal of Music Theory, 12(1), 2–49.
Back to article
Yang, S., Liu, W., and Tu, K. (2022). Dynamic programming in rank space: Scaling structured inference with low‑rank HMMs and PCFGs. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), pp. 4797–4809. ACL.
Back to article
Yang, S., Zhao, Y., and Tu, K. (2021a). Neural bi‑lexicalized PCFG induction. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL/IJCNLP), pp. 2688–2699. ACL.
Back to article
Yang, S., Zhao, Y., and Tu, K. (2021b). PCFGs can do better: Inducing probabilistic context‑free grammars with many symbols. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL‑HLT), pp. 1487–1498. ACL.
Back to article

The Potential of Unsupervised Induction of Harmonic Syntax for Jazz

Abstract

Paradigm

My account