Have a personal or library account? Click to login
Voice Assignment in Vocal Quartets Using Deep Learning Models Based on Pitch Salience Cover

Voice Assignment in Vocal Quartets Using Deep Learning Models Based on Pitch Salience

By: Helena Cuesta and  Emilia Gómez  
Open Access
|May 2022

References

  1. 1Abeßer, J., Balke, S., Frieler, K., Pfleiderer, M., and Müller, M. (2017). Deep learning for jazz walking bass transcription. In Proceedings of the AES International Conference on Semantic Audio, pages 202209, Erlangen, Germany.
  2. 2Abeßer, J. and Müller, M. (2021). Jazz bass transcription using a U-Net architecture. Electronics, 10(6). DOI: 10.3390/electronics10060670
  3. 3Arora, V. and Behera, L. (2015). Multiple F0 estimation and source clustering of polyphonic music audio using PLCA and HMRFs. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(2):278287. DOI: 10.1109/TASLP.2014.2387388
  4. 4Benetos, E. and Dixon, S. (2013). Multiple-instrument polyphonic music transcription using a temporally constrained shift-invariant model. The Journal of the Acoustical Society of America, 133(3):17271741. DOI: 10.1121/1.4790351
  5. 5Benetos, E., Dixon, S., Duan, Z., and Ewert, S. (2019). Automatic music transcription: An overview. IEEE Signal Processing Magazine, 36(1):2030. DOI: 10.1109/MSP.2018.2869928
  6. 6Bittner, R. M., McFee, B., and Bello, J. P. (2018). Multitask learning for fundamental frequency estimation in music. ArXiv, abs/1809.00381.
  7. 7Bittner, R. M., McFee, B., Salamon, J., Li, P., and Bello, J. P. (2017). Deep salience representations for F0 tracking in polyphonic music. In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), pages 6370, Suzhou, China.
  8. 8Cambouropoulos, E. (2008). Voice and stream: Perceptual and computational modeling of voice separation. Music Perception, 26(1):7594. DOI: 10.1525/mp.2008.26.1.75
  9. 9Chandna, P., Cuesta, H., and Gómez, E. (2020). A deep learning based analysis-synthesis framework for unison singing. In Proceedings of the 21st International Society for Music Information Retrieval Conference (ISMIR), pages 598604, Montreal, Canada (virtual).
  10. 10Chew, E. and Wu, X. (2004). Separating voices in polyphonic music: A contig mapping approach. In International Symposium on Computer Music Modeling and Retrieval, pages 120. Springer. DOI: 10.1007/978-3-540-31807-1_1
  11. 11Clift, S., Hancox, G., Morrison, I., Hess, B., Kreutz, G., and Stewart, D. (2010). Choral singing and psychological wellbeing: Quantitative and qualitative findings from English choirs in a cross-national survey. Journal of Applied Arts & Health, 1(1):1934. DOI: 10.1386/jaah.1.1.19/1
  12. 12Cuesta, H. (2022). Data-driven Pitch Content Description of Choral Singing Recordings. PhD thesis, Universitat Pompeu Fabra, Barcelona.
  13. 13Cuesta, H., Gómez, E., and Chandna, P. (2019). A framework for multi-f0 modeling in SATB choir recordings. In Proceedings of the Sound and Music Computing Conference (SMC), pages 447453, Málaga, Spain.
  14. 14Cuesta, H., Gómez, E., Martorell, A., and Loáiciga, F. (2018). Analysis of intonation in unison choir singing. In Proceedings of the International Conference on Music Perception and Cognition (ICMPC), pages 125130, Graz, Austria.
  15. 15Cuesta, H., McFee, B., and Gómez, E. (2020). Multiple F0 estimation in vocal ensembles using convolutional neural networks. In Proceedings of the 21st International Society for Music Information Retrieval Conference (ISMIR), pages 302309, Montreal, Canada (virtual).
  16. 16Cuthbert, M. S. and Ariza, C. (2010). Music21: A toolkit for computer-aided musicology and symbolic music data. In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), pages 637642, Utrecht, The Netherlands.
  17. 17Dai, J. and Dixon, S. (2019). Singing together: Pitch accuracy and interaction in unaccompanied unison and duet singing. The Journal of the Acoustical Society of America, 145(2):663675. DOI: 10.1121/1.5087817
  18. 18Devaney, J., Mandel, M. I., and Fujinaga, I. (2012). A study of intonation in three-part singing using the automatic music performance analysis and comparison toolkit (AMPACT). In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), pages 511516, Porto, Portugal.
  19. 19Duan, Z., Han, J., and Pardo, B. (2013). Multi-pitch streaming of harmonic sound mixtures. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(1):138150. DOI: 10.1109/TASLP.2013.2285484
  20. 20Elsayed, N., Maida, A., and Bayoumi, M. (2019). Effects of different activation functions for unsupervised convolutional LSTM spatiotemporal learning. Advances in Science, Technology and Engineering Systems Journal, 4(2):260269. DOI: 10.25046/aj040234
  21. 21Gover, M. and Depalle, P. (2020). Score-informed source separation of choral music. In Proceedings of the 21st International Society for Music Information Retrieval Conference (ISMIR), pages 231239, Montreal, Canada (virtual).
  22. 22Gray, P. and Bunescu, R. (2016). A neural greedy model for voice separation in symbolic music. In Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR), pages 782788, New York City, USA.
  23. 23Hochreiter, S. and Schmidhuber, J. (1997). Long shortterm memory. Neural Computation, 9:173580. DOI: 10.1162/neco.1997.9.8.1735
  24. 24Huron, D. (2001). Tone and voice: A derivation of the rules of voice-leading from perceptual principles. Music Perception, 19:164. DOI: 10.1525/mp.2001.19.1.1
  25. 25Jin, Y. and Wang, M. (2020). LSTM model for single to dual track piano MIDI file. In IEEE 9th Global Conference on Consumer Electronics (GCCE), pages 2931, Las Vegas, USA. DOI: 10.1109/GCCE50665.2020.9291967
  26. 26Jordanous, A. (2008). Voice separation in polyphonic music: A data-driven approach. In Proceedings of the International Computer Music Conference (ICMC), Belfast, Ireland.
  27. 27Kilian, J. and Hoos, H. H. (2002). Voice separation — a local optimisation approach. In Proceedings of the 3rd International Conference on Music Information Retrieval, pages 3046, Paris, France.
  28. 28Kingma, D. P. and Ba, J. (2014). Adam: A method for stochastic optimization. ArXiv, abs/1412.6980.
  29. 29Kirlin, P. and Utgoff, P. (2005). VoiSe: Learning to segregate voices in explicit and implicit polyphony. In Proceedings of the 6th International Conference on Music Information Retrieval (ISMIR), pages 552557, London, UK.
  30. 30Kirsh, E. R., van Leer, E., Phero, H. J., Xie, C., and Khosla, S. (2013). Factors associated with singers’ perceptions of choral singing well-being. Journal of Voice, 27(6):786e25. DOI: 10.1016/j.jvoice.2013.06.004
  31. 31Lordelo, C., Benetos, E., Dixon, S., and Ahlbäck, S. (2021). Pitch-informed instrument assignment using a deep convolutional network with multiple kernel shapes. In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR).
  32. 32Madsen, S. T. and Widmer, G. (2006). Separating voices in MIDI. In Proceedings of the 7th International Conference on Music Information Retrieval (ISMIR), pages 5760, Victoria, BC.
  33. 33McLeod, A., Schramm, R., Steedman, M., and Benetos, E. (2017). Automatic transcription of polyphonic vocal music. Applied Sciences, 7(12). DOI: 10.3390/app7121285
  34. 34McLeod, A. and Steedman, M. (2016). HMM-based voice separation of MIDI performance. Journal of New Music Research, 45:1726. DOI: 10.1080/09298215.2015.1136650
  35. 35Nakamura, E., Benetos, E., Yoshii, K., and Dixon, S. (2018). Towards complete polyphonic music transcription: Integrating multi-pitch detection and rhythm quantization. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 101105, Calgary, Canada. DOI: 10.1109/ICASSP.2018.8461914
  36. 36Petermann, D., Chandna, P., Cuesta, H., Bonada, J., and Gómez, E. (2020). Deep learning based source separation applied to choir ensembles. In Proceedings of the 21st International Society for Music Information Retrieval Conference (ISMIR), pages 733739, Montreal, Canada (virtual).
  37. 37Raffel, C., McFee, B., Humphrey, E. J., Salamon, J., Nieto, O., Liang, D., and Ellis, D. P. (2014). mir_eval: A transparent implementation of common MIR metrics. In Proceedings of the 15th International Society for Music Information Retrieval Conference (ISMIR), pages 367372, Taipei, Taiwan.
  38. 38Rosenzweig, S., Cuesta, H., Weiß, C., Scherbaum, F., Gómez, E., and Müller, M. (2020). Dagstuhl ChoirSet: A multitrack dataset for MIR research on choral singing. Transactions of the International Society for Music Information Retrieval (TISMIR), 3(1):98110. DOI: 10.5334/tismir.48
  39. 39Ryynänen, M. P. and Klapuri, A. P. (2008). Automatic transcription of melody, bass line, and chords in polyphonic music. Computer Music Journal, 32(3):7286. DOI: 10.1162/comj.2008.32.3.72
  40. 40Sarkar, S., Benetos, E., and Sandler, M. (2020). Choral music separation using time-domain neural networks. In Proceedings of the DMRN+15:Digital Music Research NetworkWorkshop, pages 78, London, UK.
  41. 41Schramm, R. and Benetos, E. (2017). Automatic transcription of a cappella recordings from multiple singers. In Proceedings of the AES Conference on Semantic Audio, Erlangen, Germany.
  42. 42Shi, X., Chen, Z., Wang, H., Yeung, D.-Y., Wong, W.-K., and Woo, W.-C. (2015). Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Proceedings of the 28th International Conference on Neural Information Processing Systems (NeurIPS), pages 802810.
  43. 43Sigtia, S., Benetos, E., and Dixon, S. (2016). An endto- end neural network for polyphonic piano music transcription. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(5):927939. DOI: 10.1109/TASLP.2016.2533858
  44. 44Su, L., Chuang, T.-Y., and Yang, Y.-H. (2016). Exploiting frequency, periodicity and harmonicity using advanced time-frequency concentration techniques for multipitch estimation of choir and symphony. In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), pages 393399, New York City, USA.
  45. 45Tanaka, K., Nakatsuka, T., Nishikimi, R., Yoshii, K., and Morishima, S. (2020). Multi-instrument music transcription based on deep spherical clustering of spectrograms and pitchgrams. In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), pages 327334, Montreal, Canada.
  46. 46Weiß, C., Schlecht, S. J., Rosenzweig, S., and Müller, M. (2019). Towards measuring intonation quality of choir recordings: A case study on Bruckner’s Locus Iste. In Proceedings of the 20th International Society for Music Information Retrieval Conference (ISMIR), pages 276283, Delft, The Netherlands.
DOI: https://doi.org/10.5334/tismir.121 | Journal eISSN: 2514-3298
Language: English
Submitted on: Oct 18, 2021
Accepted on: Mar 3, 2022
Published on: May 26, 2022
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2022 Helena Cuesta, Emilia Gómez, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.