Have a personal or library account? Click to login
Make That Sound More Metallic: Towards a Perceptually Relevant Control of the Timbre of Synthesizer Sounds Using a Variational Autoencoder Cover

Make That Sound More Metallic: Towards a Perceptually Relevant Control of the Timbre of Synthesizer Sounds Using a Variational Autoencoder

Open Access
|May 2021

References

  1. 1Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., et al. (2016). Tensorflow: A system for large-scale machine learning. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 265283. Savannah, GA, USA.
  2. 2Adel, T., Ghahramani, Z., and Weller, A. (2018). Discovering interpretable representations for both deep generative and discriminative models. In Proceedings of the International Conference on Machine Learning (ICML), pages 5059. Stockholm, Sweden.
  3. 3Blaauw, M., and Bonada, J. (2016). Modeling and transforming speech using variational autoencoders. In Proceedings of the Conference of the International Speech Communication Association (Interspeech). San Francisco, CA, USA. DOI: 10.21437/Interspeech.2016-1183
  4. 4Çakir, E., and Virtanen, T. (2018). Musical instrument synthesis and morphing in multidimensional latent space using variational convolutional recurrent autoencoders. In Proceedings of the Audio Engineering Society Convention, New York, NY, USA.
  5. 5Cheminée, P., Gherghinoiu, C., and Besnainou, C. (2005). Analyses des verbalisations libres sur le son du piano versus analyses acoustiques. In Proceedings of the Conference on Interdisciplinary Musicology (CIM05), Montréal, Canada.
  6. 6Chung, J., Kastner, K., Dinh, L., Goel, K., Courville, A., and Bengio, Y. (2015). A recurrent latent variable model for sequential data. In Advances in Neural Information Processing Systems (NIPS), pages 29802988, Montréal, Canada.
  7. 7Colonel, J., Curro, C., and Keene, S. (2017). Improving neural net auto-encoders for music synthesis. In Proceedings of the Audio Engineering Society Convention, New York, NY, USA.
  8. 8Day, W., and Edelsbrunner, H. (1984). Efficient algorithms for agglomerative hierarchical clustering methods. Journal of Classification, 1(1): 724. DOI: 10.1007/BF01890115
  9. 9Donahue, C., McAuley, J., and Puckette, M. (2019). Adversarial audio synthesis. In Proceedings of the International Conference on Learning Representations (ICLR), New Orleans, LA, USA.
  10. 10Dubois, D. (2000). Categories as acts of meaning: The case of categories in olfaction and audition. Cognitive science quarterly, 1(1): 3568.
  11. 11Engel, J., Agrawal, K. K., Chen, S., Gulrajani, I., Donahue, C., and Roberts, A. (2019). GANSynth: Adversarial neural audio synthesis. In Proceedings of the International Conference on Learning Representations (ICLR), New Orleans, LA, USA.
  12. 12Engel, J., Resnick, C., Roberts, A., Dieleman, S., Norouzi, M., Eck, D., and Simonyan, K. (2017). Neural audio synthesis of musical notes with Wavenet autoencoders. In Proceedings of the International Conference on Machine Learning (ICML), Sydney, Australia.
  13. 13Esling, P., Chemla-Romeu-Santos, A., and Bitton, A. (2018). Generative timbre spaces with variational audio synthesis. In Proceedings of the International Conference on Digital Audio Effects (DAFx), Aveiro, Portugal.
  14. 14Esling, P., Masuda, N., Bardet, A., Despres, R., and Chemla-Romeu-Santos, A. (2020). Flow synthesizer: Universal audio synthesizer control with normalizing flows. Applied Sciences, 10(1): 302. DOI: 10.3390/app10010302
  15. 15Faure, A. (2000). Des sons aux mots, comment parle-t-on du timbre musical ? PhD thesis, Ecole des Hautes Etudes en Sciences Sociales (EHESS).
  16. 16Fraccaro, M., Sønderby, S. K., Paquet, U., and Winther, O. (2016). Sequential neural models with stochastic layers. In Advances in Neural Information Processing Systems (NIPS), Barcelona, Spain.
  17. 17Fritz, C., Blackwell, A., Cross, I., Woodhouse, J., and Moore, B. (2012). Exploring violin sound quality: Investigating english timbre descriptors and correlating resynthesized acoustical modifications with perceptual properties. The Journal of the Acoustical Society of America, 131(1): 783794. DOI: 10.1121/1.3651790
  18. 18Garnier, M., Henrich, N., Castellengo, M., Sotiropoulos, D., and Dubois, D. (2007). Characterisation of voice quality in western lyrical singing: From teachers’ judgements to acoustic descriptions. Journal of Interdisciplinary Music Studies, 1(2): 6291.
  19. 19Girin, L., Hueber, T., Roche, F., and Leglaive, S. (2019). Notes on the use of variational autoencoders for speech and audio spectrogram modeling. In Proceedings of the International Conference on Digital Audio Effects (DAFx), Birmingham, UK.
  20. 20Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep learning. MIT Press. http://www.deeplearningbook.org.
  21. 21Grey, J. (1977). Multidimensional perceptual scaling of musical timbres. The Journal of the Acoustical Society of America, 61(5): 12701277. DOI: 10.1121/1.381428
  22. 22Grey, J., and Moorer, J. (1977). Perceptual evaluations of synthesized musical instrument tones. The Journal of the Acoustical Society of America, 62(2): 454462. DOI: 10.1121/1.381508
  23. 23Griffin, D., and Lim, J. (1984). Signal estimation from modified short-time Fourier transform. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(2): 236243. DOI: 10.1109/TASSP.1984.1164317
  24. 24Higgins, I., Matthey, L., Pal, A., Burgess, C., Glorot, X., Botvinick, M., Mohamed, S., and Lerchner, A. (2017). -vae: Learning basic visual concepts with a constrained variational framework. In Proceedings of the International Conference on Learning Representations (ICLR), Toulon, France.
  25. 25Hinton, G., and Salakhutdinov, R. (2007). Using deep belief nets to learn covariance kernels for Gaussian processes. In Advances in Neural Information Processing Systems (NIPS), Vancouver, Canada.
  26. 26Hsu, W.-N., Zhang, Y., and Glass, J. (2017a). Learning latent representations for speech generation and transformation. In Proceedings of the Conference of the International Speech Communication Association (Interspeech), Stockholm, Sweden. DOI: 10.21437/Interspeech.2017-349
  27. 27Hsu, W.-N., Zhang, Y., and Glass, J. (2017b). Unsupervised learning of disentangled and interpretable representations from sequential data. In Advances in Neural Information Processing Systems (NIPS), pages 18781889, Long Beach, CA, USA.
  28. 28Huber, R., and Kollmeier, B. (2006). PEMO-Q: A new method for objective audio quality assessment using a model of auditory perception. IEEE Transactions on Audio, Speech, and Language Processing, 14(6): 19021911. DOI: 10.1109/TASL.2006.883259
  29. 29Iverson, P., and Krumhansl, C. (1993). Isolating the dynamic attributes of musical timbre. The Journal of the Acoustical Society of America, 94(5): 25952603. DOI: 10.1121/1.407371
  30. 30Jaccard, P. (1912). The distribution of the flora in the alpine zone. New Phytologist, 11(2): 3750. DOI: 10.1111/j.1469-8137.1912.tb05611.x
  31. 31Jillings, N., Moffat, D., De Man, B., and Reiss, J. (2015). Web Audio Evaluation Tool: A browser-based listening test environment. In Proceedings of the Sound and Music Computing Conference (SMC), Maynooth, Ireland.
  32. 32Kendall, R. A., Carterette, E. C., and Hajda, J. M. (1999). Perceptual and acoustical features of natural and synthetic orchestral instrument tones. Music Perception, 16(3): 327363. DOI: 10.2307/40285796
  33. 33Kingma, D., and Ba, J. (2015). Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations (ICLR), San Diego, CA, USA.
  34. 34Kingma, D., and Welling, M. (2014). Auto-encoding variational Bayes. In Proceedings of the International Conference on Learning Representations (ICLR), Banff, Canada.
  35. 35Krimphoff, J., McAdams, S., and Winsberg, S. (1994). Caractérisation du timbre des sons complexes. ii. analyses acoustiques et quantification psychophysique. Le Journal de Physique IV, 4(C5): C5625. DOI: 10.1051/jp4:19945134
  36. 36Krishnan, R., Shalit, U., and Sontag, D. (2017). Structured inference networks for nonlinear state space models. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA.
  37. 37Krumhansl, C. (1989). Why is musical timbre so hard to understand? Structure and Perception of Electroacoustic Sound and Music, 9: 4353.
  38. 38Lichte, W. (1941). Attributes of complex tones. Journal of Experimental Psychology, 28(6): 455. DOI: 10.1037/h0053526
  39. 39Locatello, F., Tschannen, M., Bauer, S., Rätsch, G., Schölkopf, B., and Bachem, O. (2020). Disentangling factors of variation using few labels. In Proceedings of the International Conference on Learning Representations (ICLR).
  40. 40Marozeau, J., de Cheveigné, A., McAdams, S., and Winsberg, S. (2003). The dependency of timbre on fundamental frequency. The Journal of the Acoustical Society of America, 114(5): 29462957. DOI: 10.1121/1.1618239
  41. 41McAdams, S. (2019). The perceptual representation of timbre. In Timbre: Acoustics, Perception, and Cognition, pages 2357. Springer. DOI: 10.1007/978-3-030-14832-4_2
  42. 42McAdams, S., Beauchamp, J., and Meneguzzi, S. (1999). Discrimination of musical instrument sounds resynthesized with simplified spectrotemporal parameters. The Journal of the Acoustical Society of America, 105(2): 882897. DOI: 10.1121/1.426277
  43. 43McAdams, S., Winsberg, S., Donnadieu, S., De Soete, G., and Krimphoff, J. (1995). Perceptual scaling of synthesized musical timbres: Common dimensions, specificities, and latent subject classes. Psychological Research, 58(3): 177192. DOI: 10.1007/BF00419633
  44. 44Miller, J., and Carterette, E. (1975). Perceptual space for musical structures. The Journal of the Acoustical Society of America, 58(3): 711720. DOI: 10.1121/1.380719
  45. 45Miranda, E. (2002). Computer sound design: Synthesis techniques and programming. Music Technology series. Focal Press.
  46. 46Pati, A., and Lerch, A. (2020). Attribute-based regularization of latent spaces for variational autoencoders. Neural Computing and Applications, pages 116. DOI: 10.1007/s00521-020-05270-2
  47. 47Peeters, G., Giordano, B., Susini, P., Misdariis, N., and McAdams, S. (2011). The Timbre toolbox: Extracting audio descriptors from musical signals. The Journal of the Acoustical Society of America, 130(5): 29022916. DOI: 10.1121/1.3642604
  48. 48Randolph, J. (2005). Free-marginal multirater kappa (multirater k [free]): An alternative to fleiss’ fixed-marginal multirater kappa. In Joensuu Learning and Instruction Symposium, Joensuu, Finland.
  49. 49Reymore, L., and Huron, D. (2020). Using auditory imagery tasks to map the cognitive linguistic dimensions of musical instrument timbre qualia. Psychomusicology: Music, Mind, and Brain. DOI: 10.1037/pmu0000263
  50. 50Rezende, D., Mohamed, S., and Wierstra, D. (2014). Stochastic backpropagation and approximate inference in deep generative models. In Proceedings of the International Conference on Machine Learning (ICML), Beijing, China.
  51. 51Rezende, J., and Mohamed, S. (2015). Variational inference with normalizing flows. In Proceedings of the International Conference on Machine Learning (ICML), Lille, France.
  52. 52Roche, F., Hueber, T., Limier, S., and Girin, L. (2019). Autoencoders for music sound modeling: A comparison of linear, shallow, deep, recurrent and variational models. In Proceedings of the Sound and Music Computing Conference (SMC), Málaga, Spain.
  53. 53Samson, S., Zatorre, R., and Ramsay, J. (1997). Multidimensional scaling of synthetic musical timbre: Perception of spectral and temporal characteristics. Canadian Journal of Experimental Psychology/Revue canadienne de psychologie expérimentale, 51(4): 307. DOI: 10.1037/1196-1961.51.4.307
  54. 54Traube, C. (2004). An interdisciplinary study of the timbre of the classical guitar. PhD thesis, McGill University.
  55. 55von Bismarck, G. (1974). Timbre of steady sounds: A factorial investigation of its verbal attributes. Acta Acustica united with Acustica, 30(3): 146159.
  56. 56von Helmholtz, H. (1875). On the sensations of tone as a physiological basis for the theory of music. Longmans, Green. DOI: 10.1037/10838-000
  57. 57Wedin, L., and Goude, G. (1972). Dimension analysis of the perception of instrumental timbre. Scandinavian Journal of Psychology, 13(1): 228240. DOI: 10.1111/j.1467-9450.1972.tb00071.x
  58. 58Wessel, D. (1979). Timbre space as a musical control structure. Computer Music Journal, 3: 45. DOI: 10.2307/3680283
  59. 59Zacharakis, A. (2013). Musical timbre: Bridging perception with semantics. PhD thesis, Queen Mary University of London.
DOI: https://doi.org/10.5334/tismir.76 | Journal eISSN: 2514-3298
Language: English
Submitted on: Sep 21, 2020
Accepted on: Mar 29, 2021
Published on: May 18, 2021
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2021 Fanny Roche, Thomas Hueber, Maëva Garnier, Samuel Limier, Laurent Girin, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.