Have a personal or library account? Click to login
A Dataset of Larynx Microphone Recordings for Singing Voice Reconstruction Cover

A Dataset of Larynx Microphone Recordings for Singing Voice Reconstruction

Open Access
|Feb 2024

References

  1. 1Alonso, J., and Erkut, C. (2021). Latent space explorations of singing voice synthesis using DDSP. In Proceedings of the Sound and Music Computing Conference (SMC), pages 183190, Online.
  2. 2Askenfelt, A., Gauffin, J., Sundberg, J., and Kitzing, P. (1980). A comparison of contact microphone and electroglottograph for the measurement of vocal fundamental frequency. Journal of Speech, Language, and Hearing Research, 23(2):258273. DOI: 10.1044/jshr.2302.258
  3. 3Benesty, J., Chen, J., and Huang, Y. (2008). Microphone Array Signal Processing, volume 1. Springer Verlag, 1st edition.
  4. 4Cano, E., FitzGerald, D., Liutkus, A., Plumbley, M. D., and Stöter, F. (2019). Musical source separation: An introduction. IEEE Signal Processing Magazine, 36(1):3140. DOI: 10.1109/MSP.2018.2874719
  5. 5Cho, Y.-P., Yang, F.-R., Chang, Y.-C., Cheng, C.-T., Wang, X.-H., and Liu, Y.-W. (2021). A survey on recent deep learning-driven singing voice synthesis systems. In 2021 IEEE International Conference on Artificial Intelligence and Virtual Reality (AIVR), pages 319323. DOI: 10.1109/AIVR52153.2021.00067
  6. 6Choi, H., Lee, J., Kim, W., Lee, J., Heo, H., and Lee, K. (2021). Neural analysis and synthesis: Reconstructing speech from self-supervised representations. In Advances in Neural Information Processing Systems (NeurIPS), pages 1625116265, Virtual.
  7. 7Choi, H.-S., Yang, J., Lee, J., and Kim, H. (2022). NANSY++: Unified voice synthesis with neural analysis and synthesis. Computing Research Repository (CoRR), abs/2211.09407.
  8. 8Dai, J., and Dixon, S. (2019). Intonation trajectories within tones in unaccompanied soprano, alto, tenor, bass quartet singing. The Journal of the Acoustical Society of America, 146(2):10051014. DOI: 10.1121/1.5120483
  9. 9Dekens, T., Patsis, Y., Verhelst, W., Beaugendre, F., and Capman, F. (2008). A multi-sensor speech database with applications towards robust speech processing in hostile environments. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08), Marrakech, Morocco.
  10. 10Engel, J., Hantrakul, L., Gu, C., and Roberts, A. (2020). DDSP: Differentiable digital signal processing. In Proceedings of the International Conference on Learning Representations (ICLR).
  11. 11Gannot, S., Burshtein, D., and Weinstein, E. (2001). Signal enhancement using beamforming and nonstationarity with applications to speech. IEEE Transactions on Signal Processing, 49(8):16141626. DOI: 10.1109/78.934132
  12. 12Graciarena, M., Franco, H., Sonmez, K., and Bratt, H. (2003). Combining standard and throat microphones for robust speech recognition. IEEE Signal Processing Letters, 10(3):7274. DOI: 10.1109/LSP.2003.808549
  13. 13He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770778, Las Vegas, NV, USA. DOI: 10.1109/CVPR.2016.90
  14. 14Henry, P., and Letowski, T. (2007). Bone conduction: Anatomy, physiology, and communication. Technical Report ARL-TR-4138, United States Army Research Laboratory.
  15. 15Herbst, C. (2020). Electroglottography – an update. Journal of Voice, 34(4):503526. DOI: 10.1016/j.jvoice.2018.12.014
  16. 16Hershey, S., Chaudhuri, S., Ellis, D. P. W., Gemmeke, J. F., Jansen, A., Moore, R. C., Plakal, M., Platt, D., Saurous, R. A., Seybold, B., Slaney, M., Weiss, R. J., and Wilson, K. (2017). CNN architectures for large-scale audio classification. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 131135. DOI: 10.1109/ICASSP.2017.7952132
  17. 17Kates, J. M. (1992). On using coherence to measure distortion in hearing aids. The Journal of the Acoustical Society of America, 91(4):22362244. DOI: 10.1121/1.403657
  18. 18Kilgour, K., Zuluaga, M., Roblek, D., and Sharifi, M. (2019). Fréchet audio distance: A metric for evaluating music enhancement algorithms. Computing Research Repository (CoRR), abs/1812.08466. DOI: 10.21437/Interspeech.2019-2219
  19. 19Li, X., Chebiyyam, V., and Kirchhoff, K. (2019). Speech Audio Super-Resolution for Speech Recognition. In Proceedings of the Annual Conference of the International Speech Communication Association (Interspeech), pages 34163420, Graz, Austria. DOI: 10.21437/Interspeech.2019-3043
  20. 20McBride, M., Tran, P., Letowski, T., and Patrick, R. (2011). The effect of bone conduction microphone locations on speech intelligibility and sound quality. Applied Ergonomics, 42(3):495502. DOI: 10.1016/j.apergo.2010.09.004
  21. 21Mitsufuji, Y., Fabbro, G., Uhlich, S., St¨oter, F.-R., Défossez, A., Kim, M., Choi, W., Yu, C.-Y., and Cheuk, K.-W. (2022). Music Demixing Challenge 2021. Frontiers in Signal Processing, 1. DOI: 10.3389/frsip.2021.808395
  22. 22Nakajima, Y., Kashioka, H., Shikano, K., and Campbell, N. (2003). Non-audible murmur recognition input interface using stethoscopic microphone attached to the skin. In Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’03), volume 5, pages 708711. DOI: 10.1109/ICASSP.2003.1200069
  23. 23Otani, M., Hirahara, T., and Adachi, S. (2006). Numerical simulation of sound originated from the vocal tract in soft neck tissues. The Journal of the Acoustical Society of America, 120(5):33523352. DOI: 10.1121/1.4781428
  24. 24Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., and Sutskever, I. (2022). Robust speech recognition via large-scale weak supervision. Technical report, OpenAI.
  25. 25Renault, L., Mignot, R., and Roebel, A. (2022). Differentiable piano model for MIDI-to-audio performance synthesis. In Proceedings of the 25th International Conference on Digital Audio Effects (DAFx), Vienna, Austria.
  26. 26Rosenzweig, S., Cuesta, H., Weis, C., Scherbaum, F., Gómez, E., and Müller, M. (2020). Dagstuhl ChoirSet: A multitrack dataset for MIR research on choral singing. Transactions of the International Society for Music Information Retrieval (TISMIR), 3(1):98110. DOI: 10.5334/tismir.48
  27. 27Rosenzweig, S., Scherbaum, F., and Müller, M. (2022). Computer-assisted analysis of field recordings: A case study of Georgian funeral songs. ACM Journal on Computing and Cultural Heritage (JOCCH), 16(1):116. DOI: 10.1145/3551645
  28. 28Scherbaum, F. (2016). On the benefit of larynx-microphone field recordings for the documentation and analysis of polyphonic vocal music. Proceedings of the International Workshop Folk Music Analysis, pages 8087.
  29. 29Scherbaum, F., Mzhavanadze, N., Rosenzweig, S., and Müller, M. (2022). Tuning systems of traditional Georgian singing determined from a new corpus of field recordings. Musicologist, 6(2):142168. DOI: 10.33906/musicologist.1068947
  30. 30Schmidt, K., and Edler, B. (2021). Blind bandwidth extension of speech based on LPCNet. In 2020 28th European Signal Processing Conference (EUSIPCO), pages 426430. DOI: 10.23919/Eusipco47968.2020.9287465
  31. 31Schoeffler, M., Bartoschek, S., Stöter, F.-R., Roess, M., Westphal, S., Edler, B., and Herre, J. (2018). web-MUSHRA — a comprehensive framework for webbased listening tests. Journal of Open Research Software, 6(8). DOI: 10.5334/jors.187
  32. 32Schulze-Forster, K., Doire, C. S. J., Richard, G., and Badeau, R. (2022). Unsupervised audio source separation using differentiable parametric source models. Computing Research Repository (CoRR), abs/2201.09592.
  33. 33Serrà, J., Pascual, S., Pons, J., Araz, R. O., and Scaini, D. (2022). Universal speech enhancement with scorebased diffusion. Computing Research Repository (CoRR), abs/2206.03065.
  34. 34Serra, X., and Smith III, J. (1990). Spectral modeling synthesis: A sound analysis/synthesis system based on a deterministic plus stochastic decomposition. Computer Music Journal, 14(4):1224. DOI: 10.2307/3680788
  35. 35Shimizu, S., Otani, M., and Hirahara, T. (2009). Frequency characteristics of several non-audible murmur (NAM) microphones. Acoustical Science and Technology, 30(2):139142. DOI: 10.1250/ast.30.139
  36. 36Stupakov, A., Hanusa, E., Bilmes, J., and Fox, D. (2009). COSINE – a corpus of multi-party conversational speech in noisy environments. In Proceedings of the 2009 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’09), pages 41534156. DOI: 10.1109/ICASSP.2009.4960543
  37. 37Vincent, E., Gribonval, R., and Fevotte, C. (2006). Performance measurement in blind audio source separation. IEEE Transactions on Audio, Speech, and Language Processing, 14(4):14621469. DOI: 10.1109/TSA.2005.858005
  38. 38Vincent, E., Virtanen, T., and Gannot, S., editors (2018). Audio Source Separation and Speech Enhancement. Wiley, 1st edition. DOI: 10.1002/9781119279860
  39. 39Werner, N., Balke, S., Stöter, F.-R., Müller, M., and Edler, B. (2017). trackswitch.js: A versatile web-based audio player for presenting scientific results. In Proceedings of the Web Audio Conference (WAC), London, UK.
  40. 40Wu, D.-Y., Hsiao, W.-Y., Yang, F.-R., Friedman, O., Jackson, W., Bruzenak, S., Liu, Y.-W., and Yang, Y.-H. (2022). DDSP-based singing vocoders: A new subtractive-based synthesizer and a comprehensive evaluation. In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), pages 7683, Bengaluru, India.
  41. 41Zhuo, L., Yuan, R., Pan, J., Ma, Y., Li, Y., Zhang, G., Liu, S., Dannenberg, R., Fu, J., Lin, C., Benetos, E., Chen, W., Xue, W., and Guo, Y. (2023). LyricWhiz: Robust multilingual zero-shot lyrics transcription by Whispering to ChatGPT. In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), pages 343351, Milano, Italy.
DOI: https://doi.org/10.5334/tismir.166 | Journal eISSN: 2514-3298
Language: English
Submitted on: Mar 10, 2023
Accepted on: Jan 6, 2024
Published on: Feb 23, 2024
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2024 Simon Schwär, Michael Krause, Michael Fast, Sebastian Rosenzweig, Frank Scherbaum, Meinard Müller, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.