References
- 1Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., et al. (2016). Tensorflow: A system for large-scale machine learning. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 265–283. Savannah, GA, USA.
- 2Adel, T., Ghahramani, Z., and Weller, A. (2018). Discovering interpretable representations for both deep generative and discriminative models. In Proceedings of the International Conference on Machine Learning (ICML), pages 50–59. Stockholm, Sweden.
- 3Blaauw, M., and Bonada, J. (2016). Modeling and transforming speech using variational autoencoders. In Proceedings of the Conference of the International Speech Communication Association (Interspeech). San Francisco, CA, USA. DOI: 10.21437/Interspeech.2016-1183
- 4Çakir, E., and Virtanen, T. (2018). Musical instrument synthesis and morphing in multidimensional latent space using variational convolutional recurrent autoencoders. In Proceedings of the Audio Engineering Society Convention, New York, NY, USA.
- 5Cheminée, P., Gherghinoiu, C., and Besnainou, C. (2005). Analyses des verbalisations libres sur le son du piano versus analyses acoustiques. In Proceedings of the Conference on Interdisciplinary Musicology (CIM05), Montréal, Canada.
- 6Chung, J., Kastner, K., Dinh, L., Goel, K., Courville, A., and Bengio, Y. (2015). A recurrent latent variable model for sequential data. In Advances in Neural Information Processing Systems (NIPS), pages 2980–2988, Montréal, Canada.
- 7Colonel, J., Curro, C., and Keene, S. (2017). Improving neural net auto-encoders for music synthesis. In Proceedings of the Audio Engineering Society Convention, New York, NY, USA.
- 8Day, W., and Edelsbrunner, H. (1984). Efficient algorithms for agglomerative hierarchical clustering methods. Journal of Classification, 1(1): 7–24. DOI: 10.1007/BF01890115
- 9Donahue, C., McAuley, J., and Puckette, M. (2019). Adversarial audio synthesis. In Proceedings of the International Conference on Learning Representations (ICLR), New Orleans, LA, USA.
- 10Dubois, D. (2000). Categories as acts of meaning: The case of categories in olfaction and audition. Cognitive science quarterly, 1(1): 35–68.
- 11Engel, J., Agrawal, K. K., Chen, S., Gulrajani, I., Donahue, C., and Roberts, A. (2019). GANSynth: Adversarial neural audio synthesis. In Proceedings of the International Conference on Learning Representations (ICLR), New Orleans, LA, USA.
- 12Engel, J., Resnick, C., Roberts, A., Dieleman, S., Norouzi, M., Eck, D., and Simonyan, K. (2017). Neural audio synthesis of musical notes with Wavenet autoencoders. In Proceedings of the International Conference on Machine Learning (ICML), Sydney, Australia.
- 13Esling, P., Chemla-Romeu-Santos, A., and Bitton, A. (2018). Generative timbre spaces with variational audio synthesis. In Proceedings of the International Conference on Digital Audio Effects (DAFx), Aveiro, Portugal.
- 14Esling, P., Masuda, N., Bardet, A., Despres, R., and Chemla-Romeu-Santos, A. (2020). Flow synthesizer: Universal audio synthesizer control with normalizing flows. Applied Sciences, 10(1): 302. DOI: 10.3390/app10010302
- 15Faure, A. (2000). Des sons aux mots, comment parle-t-on du timbre musical ? PhD thesis, Ecole des Hautes Etudes en Sciences Sociales (EHESS).
- 16Fraccaro, M., Sønderby, S. K., Paquet, U., and Winther, O. (2016). Sequential neural models with stochastic layers. In Advances in Neural Information Processing Systems (NIPS), Barcelona, Spain.
- 17Fritz, C., Blackwell, A., Cross, I., Woodhouse, J., and Moore, B. (2012). Exploring violin sound quality: Investigating english timbre descriptors and correlating resynthesized acoustical modifications with perceptual properties. The Journal of the Acoustical Society of America, 131(1): 783–794. DOI: 10.1121/1.3651790
- 18Garnier, M., Henrich, N., Castellengo, M., Sotiropoulos, D., and Dubois, D. (2007). Characterisation of voice quality in western lyrical singing: From teachers’ judgements to acoustic descriptions. Journal of Interdisciplinary Music Studies, 1(2): 62–91.
- 19Girin, L., Hueber, T., Roche, F., and Leglaive, S. (2019). Notes on the use of variational autoencoders for speech and audio spectrogram modeling. In Proceedings of the International Conference on Digital Audio Effects (DAFx), Birmingham, UK.
- 20Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep learning. MIT Press.
http://www.deeplearningbook.org . - 21Grey, J. (1977). Multidimensional perceptual scaling of musical timbres. The Journal of the Acoustical Society of America, 61(5): 1270–1277. DOI: 10.1121/1.381428
- 22Grey, J., and Moorer, J. (1977). Perceptual evaluations of synthesized musical instrument tones. The Journal of the Acoustical Society of America, 62(2): 454–462. DOI: 10.1121/1.381508
- 23Griffin, D., and Lim, J. (1984). Signal estimation from modified short-time Fourier transform. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(2): 236–243. DOI: 10.1109/TASSP.1984.1164317
- 24Higgins, I., Matthey, L., Pal, A., Burgess, C., Glorot, X., Botvinick, M., Mohamed, S., and Lerchner, A. (2017). -vae: Learning basic visual concepts with a constrained variational framework. In Proceedings of the International Conference on Learning Representations (ICLR), Toulon, France.
- 25Hinton, G., and Salakhutdinov, R. (2007). Using deep belief nets to learn covariance kernels for Gaussian processes. In Advances in Neural Information Processing Systems (NIPS), Vancouver, Canada.
- 26Hsu, W.-N., Zhang, Y., and Glass, J. (2017a). Learning latent representations for speech generation and transformation. In Proceedings of the Conference of the International Speech Communication Association (Interspeech), Stockholm, Sweden. DOI: 10.21437/Interspeech.2017-349
- 27Hsu, W.-N., Zhang, Y., and Glass, J. (2017b). Unsupervised learning of disentangled and interpretable representations from sequential data. In Advances in Neural Information Processing Systems (NIPS), pages 1878–1889, Long Beach, CA, USA.
- 28Huber, R., and Kollmeier, B. (2006). PEMO-Q: A new method for objective audio quality assessment using a model of auditory perception. IEEE Transactions on Audio, Speech, and Language Processing, 14(6): 1902–1911. DOI: 10.1109/TASL.2006.883259
- 29Iverson, P., and Krumhansl, C. (1993). Isolating the dynamic attributes of musical timbre. The Journal of the Acoustical Society of America, 94(5): 2595–2603. DOI: 10.1121/1.407371
- 30Jaccard, P. (1912). The distribution of the flora in the alpine zone. New Phytologist, 11(2): 37–50. DOI: 10.1111/j.1469-8137.1912.tb05611.x
- 31Jillings, N., Moffat, D., De Man, B., and Reiss, J. (2015). Web Audio Evaluation Tool: A browser-based listening test environment. In Proceedings of the Sound and Music Computing Conference (SMC), Maynooth, Ireland.
- 32Kendall, R. A., Carterette, E. C., and Hajda, J. M. (1999). Perceptual and acoustical features of natural and synthetic orchestral instrument tones. Music Perception, 16(3): 327–363. DOI: 10.2307/40285796
- 33Kingma, D., and Ba, J. (2015). Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations (ICLR), San Diego, CA, USA.
- 34Kingma, D., and Welling, M. (2014). Auto-encoding variational Bayes. In Proceedings of the International Conference on Learning Representations (ICLR), Banff, Canada.
- 35Krimphoff, J., McAdams, S., and Winsberg, S. (1994). Caractérisation du timbre des sons complexes. ii. analyses acoustiques et quantification psychophysique. Le Journal de Physique IV, 4(C5): C5–625. DOI: 10.1051/jp4:19945134
- 36Krishnan, R., Shalit, U., and Sontag, D. (2017). Structured inference networks for nonlinear state space models. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA.
- 37Krumhansl, C. (1989). Why is musical timbre so hard to understand? Structure and Perception of Electroacoustic Sound and Music, 9: 43–53.
- 38Lichte, W. (1941). Attributes of complex tones. Journal of Experimental Psychology, 28(6): 455. DOI: 10.1037/h0053526
- 39Locatello, F., Tschannen, M., Bauer, S., Rätsch, G., Schölkopf, B., and Bachem, O. (2020). Disentangling factors of variation using few labels. In Proceedings of the International Conference on Learning Representations (ICLR).
- 40Marozeau, J., de Cheveigné, A., McAdams, S., and Winsberg, S. (2003). The dependency of timbre on fundamental frequency. The Journal of the Acoustical Society of America, 114(5): 2946–2957. DOI: 10.1121/1.1618239
- 41McAdams, S. (2019).
The perceptual representation of timbre . In Timbre: Acoustics, Perception, and Cognition, pages 23–57. Springer. DOI: 10.1007/978-3-030-14832-4_2 - 42McAdams, S., Beauchamp, J., and Meneguzzi, S. (1999). Discrimination of musical instrument sounds resynthesized with simplified spectrotemporal parameters. The Journal of the Acoustical Society of America, 105(2): 882–897. DOI: 10.1121/1.426277
- 43McAdams, S., Winsberg, S., Donnadieu, S., De Soete, G., and Krimphoff, J. (1995). Perceptual scaling of synthesized musical timbres: Common dimensions, specificities, and latent subject classes. Psychological Research, 58(3): 177–192. DOI: 10.1007/BF00419633
- 44Miller, J., and Carterette, E. (1975). Perceptual space for musical structures. The Journal of the Acoustical Society of America, 58(3): 711–720. DOI: 10.1121/1.380719
- 45Miranda, E. (2002). Computer sound design: Synthesis techniques and programming. Music Technology series. Focal Press.
- 46Pati, A., and Lerch, A. (2020). Attribute-based regularization of latent spaces for variational autoencoders. Neural Computing and Applications, pages 1–16. DOI: 10.1007/s00521-020-05270-2
- 47Peeters, G., Giordano, B., Susini, P., Misdariis, N., and McAdams, S. (2011). The Timbre toolbox: Extracting audio descriptors from musical signals. The Journal of the Acoustical Society of America, 130(5): 2902–2916. DOI: 10.1121/1.3642604
- 48Randolph, J. (2005). Free-marginal multirater kappa (multirater k [free]): An alternative to fleiss’ fixed-marginal multirater kappa. In Joensuu Learning and Instruction Symposium, Joensuu, Finland.
- 49Reymore, L., and Huron, D. (2020). Using auditory imagery tasks to map the cognitive linguistic dimensions of musical instrument timbre qualia. Psychomusicology: Music, Mind, and Brain. DOI: 10.1037/pmu0000263
- 50Rezende, D., Mohamed, S., and Wierstra, D. (2014). Stochastic backpropagation and approximate inference in deep generative models. In Proceedings of the International Conference on Machine Learning (ICML), Beijing, China.
- 51Rezende, J., and Mohamed, S. (2015). Variational inference with normalizing flows. In Proceedings of the International Conference on Machine Learning (ICML), Lille, France.
- 52Roche, F., Hueber, T., Limier, S., and Girin, L. (2019). Autoencoders for music sound modeling: A comparison of linear, shallow, deep, recurrent and variational models. In Proceedings of the Sound and Music Computing Conference (SMC), Málaga, Spain.
- 53Samson, S., Zatorre, R., and Ramsay, J. (1997). Multidimensional scaling of synthetic musical timbre: Perception of spectral and temporal characteristics. Canadian Journal of Experimental Psychology/Revue canadienne de psychologie expérimentale, 51(4): 307. DOI: 10.1037/1196-1961.51.4.307
- 54Traube, C. (2004). An interdisciplinary study of the timbre of the classical guitar. PhD thesis, McGill University.
- 55von Bismarck, G. (1974). Timbre of steady sounds: A factorial investigation of its verbal attributes. Acta Acustica united with Acustica, 30(3): 146–159.
- 56von Helmholtz, H. (1875). On the sensations of tone as a physiological basis for the theory of music. Longmans, Green. DOI: 10.1037/10838-000
- 57Wedin, L., and Goude, G. (1972). Dimension analysis of the perception of instrumental timbre. Scandinavian Journal of Psychology, 13(1): 228–240. DOI: 10.1111/j.1467-9450.1972.tb00071.x
- 58Wessel, D. (1979). Timbre space as a musical control structure. Computer Music Journal, 3: 45. DOI: 10.2307/3680283
- 59Zacharakis, A. (2013). Musical timbre: Bridging perception with semantics. PhD thesis, Queen Mary University of London.
